Debunking myths about clustered indexes - part 3 (example script)
So far in this series of posts, I have discussed how clustered indexes (CIXs) can be less efficient for both bookmark lookups & range scans than non-clustered indexes (NCIXs). In this post, I'm providing a repro script that demonstrates these two points.
The script creates a table named people & populates it with randomly generated data to represent somewhat realistic data, in that rows contain variable width values for the lname & fname columns. Random algorithms used are based on those provided by Vyas at http://vyaskn.tripod.com
The script has been tested against SQL 2005, but should also run on SQL 2000. On my notebook, this script takes ~ 20 seconds to complete. Progress messages are printed during construction of the test data, so you can fiddle with the parameters if you'd like to test the effects of using larger or smaller amounts of data.
After running the script, compare the number of reads reported by the set statistics io report & you should see that the reads performed by the queries are greater when CIXs are present on the table.
Note also that this script installs the CIX on (lname, fname), representing a common scenario where the CIX is placed on a combination of columns which the database designer thinks a physical ordering might benefit overall performance of the database. Another reason this type of scenario is so common is that, by default, SQL Server creates a CIX on the primary key of each table (unless the PK is declaratively created as non-clustered). When PKs are created on multiple columns (as often happens with natural keys) using the default clustered index, it is easy to have a similar scenario - where CIXs on multiple columns are created, affecting the performance of other NCIXs.
set
nocount on
go
use
tempdb
go
/*
* BEGIN SETUP people table
*/
if
object_id('people') is not null drop table people
create
table people (
personid
int not null identity (1, 1) primary key nonclustered
, lname varchar (25) not null
, fname varchar (25) not null
, age tinyint not null check (age between 0 and 100)
, sex character (1) not null check (sex in ('m', 'f'))
, notes varchar (512) not null)
declare
@l int, @fn varchar(25), @ln varchar(25), @bm char(6)
declare
@i int, @report_every int, @rows_to_insert int, @p varchar (50)
select
@i = 1, @rows_to_insert = 100000, @report_every = 10000
while
@i < @rows_to_insert
begin
if @i % @report_every = 0
begin
select @p = convert(varchar(25), @i) + ' from '
+ convert(varchar(25), @rows_to_insert) + ' inserted'
print @p
end
/* generate a fname of random length (between 1 to 25 chrs), of random chars */
select @l = round(1 + (rand() * (25-1)),0)
select @fn='', @bm = 'uaeioy'
while @l > 0
begin
if (@l%2) = 0
set @fn = @fn + substring(@bm,convert(int,round(1 + (rand() * (5)),0)),1)
else
set @fn = @fn + char(round(97 + (rand() * (25)),0))
set @l = @l - 1
end
/* generate a lname of random length (between 1 to 25 chrs), of random chars */
select @l = round(1 + (rand() * (25-1)),0)
select @ln='', @bm = 'goeuai'
while @l > 0
begin
if (@l%2) = 0
set @ln = @ln + substring(@bm,convert(int,round(1 + (rand() * (5)),0)),1)
else
set @ln = @ln + char(round(97 + (rand() * (25)),0))
set @l = @l - 1
end
--insert row
insert into people (lname, fname, age, sex, notes)
values (
@ln
, @fn
, round(1 + (rand() * (100-1)),0)
, case when round(1 + (rand() * (2-1)),0) = 1 then 'm' else 'f' end
, replicate ('*', round(1 + (rand() * (512-1)),0)))
set @i = @i + 1
end
/*
* END SETUP people table
*/
print
' '
print
'TEST 1: Demonstrates how CIX can affect NCIX range scan performance (with covering index)'
create
clustered index cix_people on people (lname, fname)
create
index ncix_people on people (age, sex)
print
'QUERY: select age, sex into catchresults from people where age = 25 and sex = ''f'''
print
'with CIX'
set
statistics io on
select
age, sex into catchresults from people where age = 25 and sex = 'f'
set
statistics io off
if
object_id('catchresults') is not null drop table catchresults
drop
index people.cix_people
print
'without CIX'
set
statistics io on
select
age, sex into catchresults from people where age = 25 and sex = 'f'
set
statistics io off
if
object_id('catchresults') is not null drop table catchresults
create
clustered index cix_people on people (lname, fname)
print
' '
print
'TEST 2: Demonstrates how CIX can affect bookmark performance (without covering index)'
print
'QUERY: select age, sex, personid into catchresults from people where age = 25 and sex = ''f'''
print
'with CIX'
set
statistics io on
select
age, sex, personid into catchresults from people where age = 25 and sex = 'f'
set
statistics io off
if
object_id('catchresults') is not null drop table catchresults
drop
index people.cix_people
print
'without CIX'
set
statistics io on
select
age, sex, personid into catchresults from people where age = 25 and sex = 'f'
set
statistics io off
if
object_id('catchresults') is not null drop table catchresults
if
object_id('people') is not null drop table people