Welcome to Aussie SQL Server Bloggers Sign in | Join | Help

Debunking myths about clustered indexes - part 5 - a real world example

This post discusses a problem experienced last year on one of the highest profile websites in Australia which highlights how blindly following generic "best practises" can cause serious damage to real-world systems.

Previously, in this series:
Part 1 (CIXs worsen bookmark lookups)

Part 2 (CIXs are NOT better for range scans)

Part 3 (example script)

Part 4 (CIXs, TPC-C & Oracle clusters)

The effects of this simple indexing problem severely impacted the huge number of online users (millions) who use this website. The MS engineers who worked on the problem failed to produce a solution, even though a simple solution exists because they blindly followed published "best practises" rather than thinking the problem through
logically.

This is a great case study & will benefit many who take the time to read & attempt to solve this problem.

Some background info:
a) This query provides "top 5" article functionality for the website's front page (most recent, relevant etc)
b) The application caches most requests but cache refreshes, timeouts or system re-starts cause the query to be run at very high frequency (many times per second)
c) High execution frequency, coupled with high cost of the query bring the site down regularly.
d) Hardware is over-spec'd - 8 x CPU, significantly more RAM than data (database is totally in memory) and storage is high end EMC SAN.

Senior MS support engineers were engaged to solve this problem but gave up on this after weeks of trying but there is a simple solution - can you see it?

Here are a few tips:
a) Although not all DDL is posted, all that is relevant is here
b) You cannot "cover" the query with a traditional NCIX because there are more than 16 columns referenced by the query & some filter columns exceed the 900 byte limit imposed by SQL 2000 (this was a SQL 2000 system)
c) Don't get distracted by the sub-queries, they're actually not important to the solution (due to the AND condition)
d) Forget the current execution plans (which aren't provided). These would only tell you what's currently going wrong even if you had them. They'd give you no information on how to actually solve the problem. They were not actually used in production of the solution other than to prove the solution.

To solve this problem, you really need to think beyond regular "best practises" and consider how you can use SQL Server's indexing features in a way that provides the simplest, efficient path that SQL Server can use to identify the top 5 "articles" based on the business rules (filters). Thinking logically about this problem is the key to finding a solution.

Why am I posting this challenge?
a) To demonstrate how wrong published CIX index design "best practises" (eg on ID columns) truely are and how badly they often affect real world systems (in this case, a huge user population).
b) To encourage developers and DBAs to think about index design in more sophisticated ways than blindly following generic "best practises".

I will post the solution that was actually implemented soon but for now, can you find a solution?

 

use tempdb

go

if object_id('article') > 0

drop table article

go

create table article (

   id int

      identity (1, 1) not null

      primary key clustered

 , clid int null

 , affids varchar (1000) null

 , grptype varchar (500) null

 , [type] varchar (500) null

 , chksmcurr varchar (50) null

 , chksmver varchar (50) null

 , crdt datetime null

 , creator varchar (100) null

 , dtend datetime null

 , dtstart datetime null

 , vseqno int null

 , protcont int null

 , date datetime null

 , [time] datetime null

 , headline varchar (1000) null

 , shheadline varchar (1000) null

 , pullquote varchar (8000) null

 , prodprior int null

 , grpprior int null

 , author varchar (255) null

 , source varchar (255) null

 , crdt2 datetime null

 , params varchar (5000) null

)

go

create index article2 on article(crdt2)

go

create index article3 on article(clid)

go

create index article4 on article(prodprior)

go

create index article5 on article(creator)

go

create index article6 on article(crdt, creator, vseqno)

go

--recommended by MS support engineers

--but worsened the problem (much higher CPU)

create index ndx_article2

on article(clid, grptype, chksmcurr,

chksmver, affids, crdt, creator,

dtend, dtstart, vseqno, [date] desc,

 [time] desc , crdt2)

go

--also recommended by MS support engineers

--but worsened the problem (much higher CPU)

create index ndx_article3 on article(clid, grptype,

 chksmcurr, chksmver, affids,

 crdt, creator, dtend, dtstart,

 vseqno, prodprior, grpprior, date desc,

 [time] desc , crdt2)

goselect top 5

      a.id

    , a.params

    , a.affids

    , a.protcont

    , a.headline

    , a.shheadline

    , a.date

    , a.time

    , a.author

    , a.source

    , a.pullquote

    , '' as atparam

    , gt.params as gtparam

from  article a

join  grptype gt on a.grptype = gt.grptype and a.clid = gt.clid

join  articletype at on a.type = at.type

where a.clid = 2

  and ('' + a.creator = ''

       or a.crdt is null

       or a.vseqno is null

       or (a.vseqno = (select top 1 vseqno

                             from article v

                             where a.crdt = v.crdt

                               and a.creator = v.creator

                             order by vseqno desc)

                )

            )

  and a.grptype = 'news'

  and ((a.dtstart <= getdate() or (isdate(a.dtstart) = 0))

       and (a.dtend > getdate() or (isdate(a.dtend) = 0)))

  and ((',' + a.affids + ',' like '%,713,%') and ',' + a.affids + ',' not like '%,706,%')

 and a.chksmcurr = a.chksmver

order by a.prodprior asc

       , a.grpprior asc

       , a.date desc

       , a.time desc

go

Published Friday, 8 June 2007 12:20 AM by Greg_Linwood

Comments

# Greg Linwood - Debunking myths about clustered indexes - part 5 - a real world example

# Greg Linwood - Debunking myths about clustered indexes - part 5 - a real world example

# re: Debunking myths about clustered indexes - part 5 - a real world example

The "Transaction Blog" series of articles basically says:

"Use NCL indexes, because they are 20-30% percent faster then CLX".

This is the same as to say:

"The subway in large cities is being bilt to save on boot repair, because with introduction of subways boots are worn out each day less by 30%".

While the figure above may be absolutely  correct, this is NOT what the subway is being used.

Yes, the clustered indexes are losing by 30% in speed retrieval, and in update speed (or 20 or 50% for that matter), but this has NO practical meaning for 95% of the businesess in the world (because they are far from maxing out!

ANY table in the robust design must have a clustered index.

There is a fundamental consideration:
a clustered index is ALWAYS USED, regardless of selectivity of the index.

(An automatic SSQL Server decision on) usage of NCL/vs.full table scan is a BIG GRAY AREA, with hundreds of factors are involved.
(and automatic it is in 99% of the TSQL code in the world).

Yes, if you have consulting practice deals with physical DB design, continue avoiding CLX, this is a good chance for getting DB fine-tuning work with existing clients.

If you are a diligent modeler/designer/developer -
ALWAYS use Clustered indexes + identity against each table (even for look-up table with 2 rows).

There are two main reasons:

1) ETL process into/from large tables is NOT robust in principle without CLX.
The CL index and identity is the foundation of bookmarking (data selection for ETL).  (as we know datetime data type with default of GetDate can not be used for ETL, due to same values produced for different rows, the only other practical alternative is the timestamp datatype).

The ETL, based on usage on CLX index, will NEVER FAIL.

The ETL, based on NCL index will ALWAYS be predicated on spread/selectivity of the index, number of rows, retrieve for the batch etc.
At any moment in future it may (and will) REVERT to FULL TABLE SCAN.

As we know, in SQL Server 2000 we can NOT FIX quiry plan. Maximum that we can do is use hints.

2) Any selection of a number of rows (even if there is a UNQ constraint, supporting the index) may revert unexpectedly into FULL TABLE SCAN. This may happen after addition of a (theoretically) 1 single row- because statistics might change against a table, and SQL Server will automatically DECIDE that full table scan is preferable for the query.

Any query (i.e. except a single row extraction, supported by UNQ constraint), theoretically may revert into full table scan ANY TIME.

If you have a table with 500 mln records, and 100 of records are retrieved in half a second, it does NOT guarantee anything.

A mid-tier developer  with embedded SQL in application (or report) will write SELECT TOP 500 (instead of SELECT top 100) and all of a sudden you are  having 20 minute query (he has never heard about indexes, selectivity etc - that is none of his business).

As we know, by default, the underlying data in SQL Server is LOCKED, and NOT accessible for competing processes.

As a result, the table is dead for 20 minutes, the DB is dead, the application hangs, and usually, as any professional knows, it happens at 4AM.

Once again, I MUST use CLX to be protected against possibility of full table scan (i.e. 10 minute long retrieval).

The fact, that usage of CLX (vs. NCL) index made  my query 30ms long (instead of 20 mseconds) have no practical meaning for me.

In my database, a financial transaction happens once every 5 seconds. So difference between 20 and 30 ms for transaction is absolutely NOT relevant.

IF/WHEN our company will get  50 transactions per second (and this  difference  will become practical) I still will not consider switching over to NCL index. This is because the company will have  100 times more revenue, and will easily buy server that is 20 times more scalable.

Yes, everyting that is said in this serious of  articles is correct, but it has NO bearing to real life.

I will always use a CLX and an identity against EACH table in my database, althought it will slow down inserts and updates by 30%.

Alexei

P.S. I am SQL Solutions Engineer with online Payment Processing company.

I can sleep soundly at 4 at night, knowing that the DB design pracices are robust and correct.
Saturday, 15 March 2008 8:31 AM by alakimov
Anonymous comments are disabled