Welcome to Aussie SQL Server Bloggers Sign in | Join | Help

Duplicate indexes

SQL Server curiously allows duplicate indexes to be created on a table with precisely identical definitions. There is absolutely zero benefit from this practise yet significant harm can result - including:

  • Performance degradation from maintenance overhead during updates, inserts & deletes
  • Increased transaction log activity (filling tlogs, disks & breaking Log Shipping)
  • Increased backup sizes (indexes make backups bigger, requiring more disk space, longer maintenance windows & consuming more energy during backups)

Other major RDBMS platforms don't allow this nonsense yet MS apparently don't consider fixing this terribly important, given their response in this Connect item

I began a discussion about this topic in the SQL MVP dicussion forum last week & I think there was a fair consensus that something should be done. Fellow MVPs who work in performance tuning commented that they encounter duplicate indexes commonly in their work (I see it regularly). If you agree MS should do something to fix this, I encourage you to vote on this Connect item to try & influence change:

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=313227

If you are wondering whether you have any duplicate indexes on your system, try this procedure out. Any duplicates listed should almost certainly be removed (unless they're explicitly referred to in query index hints, which would break if you remove the named indexes)

To use this proc, create it in the master database then call it using this syntax from within your db:

 

exec sp_MSForEachTable 'usp_duplicateindexes''?'''

 

 

Here's the proc script:

 

USE master

GO

create procedure [dbo].[usp_duplicateindexes]

 @tablename varchar(255)

as

set nocount on

print @tablename

 

--dump sp_helpindex into temp table

if object_id('tempdb..#helpindex') > 0 drop table #helpindex

create table #helpindex (

  index_name varchar (900) not null primary key

, index_description varchar (1000) null

, index_keys varchar (1000) null)

insert #helpindex exec sp_helpindex @tablename

 

--add [inccols] to temp table & cursor over output, adding included col defs

alter table #helpindex add inccols varchar(1000) null

declare cr cursor for

select si.name, sc.name

from sysobjects so

join sysindexes si on so.id = si.id

join sys.index_columns ic on si.id = ic.object_id and si.indid = ic.index_id

join sys.columns sc on ic.object_id = sc.object_id and ic.column_id = sc.column_id

where so.xtype = 'U'

and so.name = @tablename

and ic.is_included_column = 1

order by si.name, ic.index_column_id

 

declare @siname varchar(1000), @scname varchar(1000)

open cr

fetch next from cr into @siname, @scname

 while @@fetch_status = 0

  begin

   update #helpindex set inccols = isnull(inccols , '') + @scname + ', ' where index_name = @siname

   fetch next from cr into @siname, @scname

  end

update #helpindex set inccols = left(inccols, datalength(inccols) - 2) where right(inccols, 2) = ', '

close cr

deallocate cr

 

--dump duplicates into second temp table & pump results if there are any

if object_id('tempdb..#helpindex2') > 0 drop table #helpindex2

create table #helpindex2 (

  index_name varchar (900) not null primary key

, index_description varchar (1000) null

, index_keys varchar (1000) null

, inccols varchar(1000) null

)

insert into #helpindex2

select hi.index_name, hi.index_description, hi.index_keys, hi.inccols

from #helpindex hi

join #helpindex h2 on hi.index_keys=h2.index_keys

                    and hi.index_description=h2.index_description

                    and hi.index_name<>h2.index_name

 

if @@rowcount > 0

select @tablename as "tablename", * from #helpindex2 order by index_name, index_keys, inccols

 

--cleanup temp objects

if object_id('tempdb..#helpindex2') > 0 drop table #helpindex2

if object_id('tempdb..#helpindex') > 0 drop table #helpindex

posted by Greg_Linwood | 1 Comments

Which query is scanning that index?

SQL 2005's db_dm_index_usage_stats DMV allows DBAs to analyse how often indexes are being used (or not used), including individual counts for seeks, scans & lookups on each index in your database. I was thrilled when I first saw this DMV as there was previously no way to determine whether indexes weren't being used & could be safely removed from a table without risk of queries suddenly full scanning the table & bringing a system's performance to its knees.

Another useful metric reported by db_dm_index_usage_stats is its user_scans column. This reports how many times each index has been fully scanned without a filter predicate (not range scanned). Full scans are usually bad from a performance perspective so it is useful to know whether you have any indexes that are being fully scanned.

Even more useful would be to know which QUERIES are scanning those indexes so you could do something about tuning them. Unfortunately, db_dm_index_usage_stats doesn't quite go this far but I recently decided to try & develop a script to work this out. After scratching around the various performance DMVs & not getting very far, I decided to post a question to the SQL MVP discussion forum.

Thankfully, an ex SQL MVP who now works for Microsoft on the SQL dev team - Umachandar Jayachandran - came up with this innovative approach:

 

with xmlnamespaces('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as s)

select top 50 st.text, qp.query_plan, qs.ec as exec_count, qs.tlr as total_reads

from (

  select s.sql_handle, s.plan_handle, max(s.execution_count) as ec, max(s.total_logical_reads) as tlr

  from sys.dm_exec_query_stats as s

  where s.max_logical_reads > 100

  group by s.sql_handle, s.plan_handle) as qs

  cross apply sys.dm_exec_query_plan(qs.plan_handle) as qp

  cross apply sys.dm_exec_sql_text(qs.sql_handle) as st

  cross apply (select distinct relop.value('@Index','nvarchar(130)') as IndexRef

  from qp.query_plan.nodes(

     N'//s:Batch/s:Statements/s:StmtSimple/s:QueryPlan[1]//

     s:RelOp[@PhysicalOp = ("Index Scan")]/*[local-name() = ("IndexScan")]/

     s:Object[@Database = ("[DBNameHere]")

     and @Table = ("[TableNameHere]")

     and @Index = ("[IndexNameHere]")]'

     ) as ro(relop)

  ) as r

 

Notice that this query leverages three performance DMVs:

* dm_exec_query_stats - identifies base query performance stats

* dm_exec_query_plan - cross joined to aggregated view of dm_query_stats to pick up the XML query plan for each query.

* dm_exec_sql_text - also cross joined to pick up each sql statement.

The innovative part of this query is the way it uses XQuery syntax to interrogate the XML typed query_plan column from dm_exec_query_plan, traversing the XML Showplan structure of the query_plan column's nodes collection. It starts at the Batch element, navigating through Statements/StmtSimple/QueryPlan before searching through RelOp elements for one with PhysicalOp attribute = "Index Scan" and an IndexScan element with a child Object element containing the required DBname, TableName & IndexName. This sounds like quite a mouthful, but if you compare the above sentence with some XML output from a query plan which scans an index you should be able to follow the XML traversal fairly easily.

Behind the scenes, this query is actually trawling the procedure cache, performing the above parse operation on every plan in the cache. On systems with large caches (10s of Gb) this might take minutes, but this shouldn't result in a significant performance degradation as the scan is non-blocking & although it works a single CPU hard, it shouldn't consume multi CPUs (it hasn't in my testing so far at least). The filter on max_logical_reads attempts to limit the number of plans parsed. Whilst I have set the filter to 100 reads here, you might experiment with this based on the size of index scans you're evaluating. There's no point parsing query plans that don't have a maximum logical reads count at least the same size as the index scan. Setting this filter to a size close to the number of pages your scanned index is using for storage will usually eliminate the majority of smaller plans within the cache from being XML parsed. 

Keep in mind that this query only works with execution plans currently cached in the procedure cache. This is a consequence of using these DMVs which are limited to accessing contents of the current cache. Because the procedure cache is fairly transient, results may vary if the system is under memory pressure or if you restart SQL Server regularly, or manually clear procedure cache (eg by using dbcc freeproccache).

Another limitation of this query is that it only shows stored procedures, without showing the specific statements within stored procedures that are scanning the index. This shouldn't be too hard to work out by looking at the stored proc code, but other DMVs can help zero in on the actual statements. I'll try & improve this aspect of the code when time permits.

Thanks so much UC for your help with this valuable script! (c:

update to usp_helpindexusagestats

Here's an update to the utility SP [usp_helpindexusagestats] I posted a few weeks back, in response to a comment from Mitch Wheat that he was receiving empty result sets, which could have occurred if the original version was run against a system with an empty cache for the db_dm_index_usage_stats() DMV. This version outer joins output from the DMV against output from MS shipped [sp_helpindex] vs the original version's inner join approach, which most likely explains the experience Mitch had the original version.

 

create procedure [dbo].[usp_helpindexusagestats]

@tablename varchar(1000)

as

set nocount on

 

--collect output from system supplied [sp_helpindex]

if object_id('tempdb..#helpindex') > 0 drop table #helpindex

create table #helpindex (

index_name varchar (1000) not null primary key

, index_description varchar (1000) null

, index_keys varchar (1000) null)

insert #helpindex exec sp_helpindex @tablename

alter table #helpindex add inccols varchar(1000) null

 

--cursor over output, adding included columns

declare cr cursor for

select si.name, sc.name

from sysobjects so

join sysindexes si on so.id = si.id

join sys.index_columns ic on si.id = ic.object_id and si.indid = ic.index_id

join sys.columns sc on ic.object_id = sc.object_id and ic.column_id = sc.column_id

where so.xtype = 'U'

and so.name = @tablename

and ic.is_included_column = 1

order by si.name, ic.index_column_id

declare @siname varchar(1000), @scname varchar(1000)

open cr

fetch next from cr into @siname, @scname

while @@fetch_status = 0

begin

update #helpindex set inccols = isnull(inccols , '') + @scname + ', ' where index_name = @siname

fetch next from cr into @siname, @scname

end

update #helpindex set inccols = left(inccols, datalength(inccols) - 2)

where right(inccols, 2) = ', '

close cr

deallocate cr

 

--return output, joining index usage stats DMV output

select hi.index_name, hi.index_description, hi.index_keys

, hi.inccols as included_columns, ius.index_id

, user_seeks, user_scans, user_lookups, user_updates

, last_user_seek, last_user_scan, last_user_lookup

from #helpindex hi

join sysindexes si on si.name = hi.index_name collate database_default

left join sysobjects so on si.id = so.id

left join sys.dm_db_index_usage_stats ius on ius.object_id = si.id

and ius.index_id = si.indid and ius.database_id = db_id()

 

drop table #helpindex

How SQL Server could help save the environment..

Millions of database backups are performed every day & night around the world, consuming mountains of disk space & generating massive levels of power consumption as enormous numbers of bytes are copied to disk, then to tape or other removable media.

Assuming something like ~25% > 50% of the data contained within an average database relates to indexes (the ratio is often mugh higher), an option to exclude indexes from database backups would make it possible to reduce the amount of data required to perform backups, in turn reducing backup time & energy consumed by backup operations. I don't know how to go about estimating how much time & power could be saved from such a feature, but surely it would have to be huge.

Because indexes are only re-sorted copies of table columns, no loss in data integrity results from eliminating index data from backups, so I decided to register a feedback item on Microsoft's "Connect" website here:

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=331220

The benefits from Microsoft including an option to eliminate index data from full database backups include:

(a) Significantly faster backups

(b) Reduced disk space

(c) Reduced energy consumption (less data copied to millions of daily backups around the world translates to less disk head movement & therefore less energy spent on backups)

Of course, any such feature should be optional as some will prefer to continue backing up index data.

If you agree with this feature request & would like to see Microsoft add it to a future version of SQL Server, please record a vote on the connect item link above!

posted by Greg_Linwood | 9 Comments

usp_helpindexusagestats: a helpfull index analysis script

Here's an index analysis script I put together recently which combines output from sp_helpindex (also with included columns, which the regular sp_helpindex lacks) and the dm_db_index_usage_stats() DMV.

I use this script primarily to identify which indexes are sub-sets of each other & can therefore be consolidated into a single index with the added information about usage patterns so I also have an understanding of how often the indexes are being accessed by queries.

create procedure usp_helpindexusagestats
  @tablename varchar(1000)
as

set nocount on

if object_id('tempdb..#helpindex') > 0 drop table #helpindex

create table #helpindex (
   index_name varchar (1000) not null primary key
 , index_description varchar (1000) null
 , index_keys varchar (1000) null
)

insert #helpindex
exec sp_helpindex @tablename

alter table #helpindex add inccols varchar(1000) null

declare cr cursor for
select si.name, sc.name
from sysobjects so
join sysindexes si on so.id = si.id
join sys.index_columns ic on si.id = ic.object_id and si.indid = ic.index_id
join sys.columns sc on ic.object_id = sc.object_id and ic.column_id = sc.column_id
where so.xtype = 'U'
  and so.name = @tablename
  and ic.is_included_column = 1
order by si.name, ic.index_column_id

declare @siname varchar(1000), @scname varchar(1000)

open cr

fetch next from cr into @siname, @scname

while @@fetch_status = 0
 begin

  update #helpindex set inccols = isnull(inccols , '') + @scname + ', ' where index_name = @siname

  fetch next from cr into @siname, @scname
 end

update #helpindex set inccols = left(inccols, datalength(inccols) - 2)
where right(inccols, 2) = ', '

close cr
deallocate cr

select hi.index_name
     , hi.index_description
     , hi.index_keys
     , hi.inccols as included_columns
     , ius.index_id
     , user_seeks
     , user_scans
     , user_lookups
     , user_updates
     , last_user_seek
     , last_user_scan
     , last_user_lookup
from  #helpindex hi
join  sysindexes si on si.name = hi.index_name collate database_default
join sysobjects so on si.id = so.id
left join sys.dm_db_index_usage_stats ius on ius.object_id = si.id and ius.index_id = si.indid and ius.database_id = db_id()
where so.name = @tablename

drop table #helpindex
go

This usp_ should first be created within the user database & can then be called simply as follows:

exec usp_helpindexusagestats 'MyTableName'

Output includes some rich information about the indexes on the table including column names (keys & included columns) and also usage stats information about how often seeks, scans & lookups are being performed on each index.

posted by Greg_Linwood | 2 Comments

A great idea for solving annoying collation issues

Rob Farley recently blogged about an excellent idea for solving the annoying collation issues so many struggle with during updates, consolitations or server rebuilds.

Currently SQL Server throws errors if tables are joined with columns that have incompatible collations (eg, tempdb has SQL_Latin_General_CI_AS but user dbs have Latin_General1).

Rob's idea is to provide a database level option which configures SQL Server to automatically use the collation of the left or right hand column in a join (or instance collation) instead of just failing. This would save users from many common collation problem & significantly easier than re-installing SQL Server or rebuilding databases with changed collations (presently a tedious task)

Rob has also posted a Feedback item on Microsoft's Connect site here:

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=324910

If you like this idea, it would help if you could add your vote to the connect item to help give it some sway with Microsoft. It's too late for this to make it into SQL 2008, but you could help influence its release into SQL 11, 2 to 3 years from now..

posted by Greg_Linwood | 0 Comments

Sad news - Ken Henderson passed away yesterday

Unfortunately for the SQL Server community, Ken Henderson passed away yesterday.

Many will remember Ken for his "Gurus Guide" SQL Server books which are amongst the most highly regarded technical resources available. I had the privilege of working with Ken as a reviewer of the final edition of this series & learned a great deal during the process from a true master of SQL Server technology.

In addition to Ken's professional duties at Microsoft, he also participated in many community events as a speaker, wrote in-depth technical articles and maintained an excellent technical blog.

I learned significantly from Ken's various works, particularly in relation to the finer points of disk IO, memory management and SQL Server's UMS / SQLOS process scheduling internals.

I wish I had the chance to have met him in person to thank him for his willingness to draw on his experiences and share very useful technical information with everyone who was interested.

RIP Ken and my condolences to your family.

posted by Greg_Linwood | 0 Comments

Debunking myths about clustered indexes - part 5b - a real world solution

In my last post, I discussed a real world clustered index design problem which had far-reaching consequences on a high volume & profile website. In this post I'll discuss the problem in further detail & post the solution which solved the problem

In this scenario, a Clustered Index (CIX) was placed on an identity column but the main query which accesses the table at very high frequency did not use the identity column for filtering or joining in any way. The only reason the CIX was defined on this column is that SQL Server places CIXs on Primary Keys by default (afaik, no more thought went into the decision than this).

The fact that the identity column was not used by the query for filtering (in either WHERE or JOIN) means that it's b-tree lookup structure was totally wasted on this query (& this query is by far the most performance sensitive query that accesses the table). This is a common problem when placing a CIX on IDENTITY columns.

This might not be so bad if another index could have been created to "cover" the query, but the query accesses more columns (& bytes) than are allowed in an index (this system was using SQL Server 2000) so another index cannot be designed to totally suit the needs of the query.

The MS support engineers who worked on this case tried to design Non-Clustered Indexes (NCIXs) with as many columns as possible but ultimately, the other columns not included in the index also have to be retrieved & forcing SQL Server to perform extremely expensive Clustered Index Bookmark Lookups. Even though these occur in memory, they are still enormously expensive & totally brought this high profile system down week after week last year during peak workloads.

The solution:

Because CIXs become the storage container for the tables they are built on (the table's rows are stored in the leaf pages of a CIX), they naturally contain all columns. A limited number of columns can be used for the index keys, but all columns exist on the leaf pages. Hence, CIXs always cover queries, although not necessarily in their index keys.

The trick to solving this particular problem was to recognise that placing the CIX keys on a few of the filtered columns would allow SQL Server to perform a partial range scan over however many rows those columns return & then evaluate the wildcard search in the affids column within the leaf pages as it scans over that range. This is far more efficient then evaluating those extra colums over an extremely expensive Bookmark Lookup, whether in memory or not, as no additional IOs need to be performed (where at least three extra reads are required PER ROW with the CIX bookmark lookup, massively increasing total reads performed by the query).

It appears that the engineers who worked on this problem either didn't consider altering the existing CIX or maybe they simply believed blindly in the "best practise" of placing CIXs on identity columns. Either way, they didn't try changing the CIX design & consequently failed to solve the problem.

So finally, the solution was to replace the identity based CIX with the following definition:

create clustered index cix_article on article (clid, grptype, chksmcurr, chksmver)

With this CIX, the query can simply traverse its b-tree, locating the range where clid = 2 & grptype = 'news', then scan that range whilst testing the other predicates - chksmcurr = chksmver, the wildcard search in affids etc. As it identifies each row, it can extract all other column values, sort & produce the top 5 rows. Crucially, no Bookmark Lookups are required.

Immediately after installing this index, the site dropped from extremtly high CPU utilisation (> 90% consistently at peak load) to less than 10% even at peak load. 

The reason why CPU load was reduced by such a substantial margin is that Bookmark Lookups are extremely expensive, even though the CIX b-tree pages are typically in memory. They often happen as such high frequency that they can easily overload even the most powerful systems. Note that in this case, the DB size was only ~2Gb, the server had 16Gb RAM, 8 high end CPUs & high end SAN storage. All that hardware couldn't handle the massive logical query inefficiency.

It's also crucial to note that the identity column was not involved in the query's filter predicates (although it was returned in the SELECT list). This is a significant problem with placing CIXs on identity columns if performance sensitive queries do not actually use that column for filtering - it not only wastes a useful b-tree structure (one that covers any query) it also introduces a costly layer of indirection for Bookmark Lookups if the query accesses many columns and cannot be easily covered with a NCIX. Many people place CIXs on identity columns to achieve fragmentation objectives without considering the opportunity cost with query processing (which is nearly always far more important than avoiding fragmentation, which is another very poorly understood topic in general)

Note also that I am demonstrating the usefulness of a CIX in this particular discussion (yes, they're often useful - just not always) but I am also showing why using identity columns blindly is a bad idea.

Ultimately, you need to think through how a query accesses its underlying data structures to get the best out of index design & resist the temptation to simply roll out "best practises" generically..

posted by Greg_Linwood | 3 Comments

Debunking myths about clustered indexes - part 5 - a real world example

This post discusses a problem experienced last year on one of the highest profile websites in Australia which highlights how blindly following generic "best practises" can cause serious damage to real-world systems.

Previously, in this series:
Part 1 (CIXs worsen bookmark lookups)

Part 2 (CIXs are NOT better for range scans)

Part 3 (example script)

Part 4 (CIXs, TPC-C & Oracle clusters)

The effects of this simple indexing problem severely impacted the huge number of online users (millions) who use this website. The MS engineers who worked on the problem failed to produce a solution, even though a simple solution exists because they blindly followed published "best practises" rather than thinking the problem through
logically.

This is a great case study & will benefit many who take the time to read & attempt to solve this problem.

Some background info:
a) This query provides "top 5" article functionality for the website's front page (most recent, relevant etc)
b) The application caches most requests but cache refreshes, timeouts or system re-starts cause the query to be run at very high frequency (many times per second)
c) High execution frequency, coupled with high cost of the query bring the site down regularly.
d) Hardware is over-spec'd - 8 x CPU, significantly more RAM than data (database is totally in memory) and storage is high end EMC SAN.

Senior MS support engineers were engaged to solve this problem but gave up on this after weeks of trying but there is a simple solution - can you see it?

Here are a few tips:
a) Although not all DDL is posted, all that is relevant is here
b) You cannot "cover" the query with a traditional NCIX because there are more than 16 columns referenced by the query & some filter columns exceed the 900 byte limit imposed by SQL 2000 (this was a SQL 2000 system)
c) Don't get distracted by the sub-queries, they're actually not important to the solution (due to the AND condition)
d) Forget the current execution plans (which aren't provided). These would only tell you what's currently going wrong even if you had them. They'd give you no information on how to actually solve the problem. They were not actually used in production of the solution other than to prove the solution.

To solve this problem, you really need to think beyond regular "best practises" and consider how you can use SQL Server's indexing features in a way that provides the simplest, efficient path that SQL Server can use to identify the top 5 "articles" based on the business rules (filters). Thinking logically about this problem is the key to finding a solution.

Why am I posting this challenge?
a) To demonstrate how wrong published CIX index design "best practises" (eg on ID columns) truely are and how badly they often affect real world systems (in this case, a huge user population).
b) To encourage developers and DBAs to think about index design in more sophisticated ways than blindly following generic "best practises".

I will post the solution that was actually implemented soon but for now, can you find a solution?

 

use tempdb

go

if object_id('article') > 0

drop table article

go

create table article (

   id int

      identity (1, 1) not null

      primary key clustered

 , clid int null

 , affids varchar (1000) null

 , grptype varchar (500) null

 , [type] varchar (500) null

 , chksmcurr varchar (50) null

 , chksmver varchar (50) null

 , crdt datetime null

 , creator varchar (100) null

 , dtend datetime null

 , dtstart datetime null

 , vseqno int null

 , protcont int null

 , date datetime null

 , [time] datetime null

 , headline varchar (1000) null

 , shheadline varchar (1000) null

 , pullquote varchar (8000) null

 , prodprior int null

 , grpprior int null

 , author varchar (255) null

 , source varchar (255) null

 , crdt2 datetime null

 , params varchar (5000) null

)

go

create index article2 on article(crdt2)

go

create index article3 on article(clid)

go

create index article4 on article(prodprior)

go

create index article5 on article(creator)

go

create index article6 on article(crdt, creator, vseqno)

go

--recommended by MS support engineers

--but worsened the problem (much higher CPU)

create index ndx_article2

on article(clid, grptype, chksmcurr,

chksmver, affids, crdt, creator,

dtend, dtstart, vseqno, [date] desc,

 [time] desc , crdt2)

go

--also recommended by MS support engineers

--but worsened the problem (much higher CPU)

create index ndx_article3 on article(clid, grptype,

 chksmcurr, chksmver, affids,

 crdt, creator, dtend, dtstart,

 vseqno, prodprior, grpprior, date desc,

 [time] desc , crdt2)

goselect top 5

      a.id

    , a.params

    , a.affids

    , a.protcont

    , a.headline

    , a.shheadline

    , a.date

    , a.time

    , a.author

    , a.source

    , a.pullquote

    , '' as atparam

    , gt.params as gtparam

from  article a

join  grptype gt on a.grptype = gt.grptype and a.clid = gt.clid

join  articletype at on a.type = at.type

where a.clid = 2

  and ('' + a.creator = ''

       or a.crdt is null

       or a.vseqno is null

       or (a.vseqno = (select top 1 vseqno

                             from article v

                             where a.crdt = v.crdt

                               and a.creator = v.creator

                             order by vseqno desc)

                )

            )

  and a.grptype = 'news'

  and ((a.dtstart <= getdate() or (isdate(a.dtstart) = 0))

       and (a.dtend > getdate() or (isdate(a.dtend) = 0)))

  and ((',' + a.affids + ',' like '%,713,%') and ',' + a.affids + ',' not like '%,706,%')

 and a.chksmcurr = a.chksmver

order by a.prodprior asc

       , a.grpprior asc

       , a.date desc

       , a.time desc

go

posted by Greg_Linwood | 3 Comments

Appalled at the news media's handling of the Virginia College massacre

I am disgusted that the Age, the Australian the Herald Sun (my local news rags) are all showing pictures of the Virginia College gunman pointing guns right at our faces. Unfortunately, many other news sources around the world are also probably showing the same menacing pictures purely for their sensationalistic commercial value

Sure - they should report on the story as it is certainly important news which the community needs to know about, but why publish menacing pictures of this *** who has just blown away 32 young lives, now pointing a gun directly at our faces?

Can you imagine what it would be like for the parents of victims, already deeply in shock to now see this @sshole pointing a gun at their faces on the TV?

I'm disappointed in our media's sense of judgement about this.

Maybe they should spend more time discussing the heroic acts performed by people such as Liviu Librescu than the cowardice of the shooter (who's name isn't worth mentioning)

posted by Greg_Linwood | 0 Comments

SQL Server SP2 GDR

The info below was posted to the SQL Server MVP discussion forum a couple of days ago by James Howey (from the MS SQL team) & has been approved for re-distribution. It covers some details on problems associated with SQL 2005 SP2 & details about the GDR / QFE process which might interest DBAs.

Bob Ward (also MS SQL team) has blogged some more useful details on this topic here:

http://blogs.msdn.com/psssql/archive/2007/04/06/post-sql-server-2005-service-pack-2-sp2-fixes-explained.aspx

* * * * * * * * * * * *

We recently posted updates to address an issue in two SP2 Maintenance Plan tasks. The KB article describes the symptoms as follows:

FIX: The Check Database Integrity task and the Execute T-SQL Statement task in a maintenance plan may lose database context in certain circumstances in SQL Server 2005

In Microsoft SQL Server 2005 Service Pack 2 (SP2), the following tasks in a maintenance plan may lose database context in certain circumstances:
. Check Database Integrity
. Execute T-SQL Statement

For example, if a Check Database Integrity task is configured to check integrity on multiple databases, the task repeatedly checks the default database.

For the Execute T-SQL Statement task, this problem occurs if the following conditions are true:
. The Execute T-SQL Statement task contains multiple batches.
. The batches are separated by the GO statement.
. No batches use the USE statement to specify the database.
. The statements in each batch do not contain the database name.

In this scenario, those statements in the second batch and in succeeding batches will be executed against the default database.

This fix is shipped in two different packages:
. The GDR release is available via http://support.microsoft.com/kb/934458
. The QFE release is available via http://support.microsoft.com/kb/934459

What is the difference between a GDR and QFE release? GDRs are a new means of providing support. GDR fixes are reserved for those issues identified by SQL Server support as important enough to install on every instance of SQL Server. QFEs are used for the majority of fixes where the effects of the problem are not widespread or severe enough to warrant a GDR.

GDRs and QFEs are associated with certain build numbers. Build numbers 3043 through 3149 are reserved for GDR releases. Build numbers 3150 and up are reserved for QFEs. By reserving lower build numbers for GDR releases, users can continue to take advantage of GDR fixes, without having to install QFEs, which aggregate all the fixes shipped since the most recent service pack. Once a QFE is installed, GDRs will no longer update your system, because the build number is too low, and you have to install QFEs to pick up new fixes.

Why is this fix a GDR? This fix was promoted to GDR because users that relied on the Check Database Integrity task to validate their databases were not informed of database integrity issues that they expected to be detected. The fix also addresses issues in the Execute T-SQL task, but, since these issues typically lead to script failure, duly noted by SQL Server, the T-SQL issues alone were not enough to justify the GDR.

Those users who now, or in the future, rely on the Check Database Integrity Task or the Execute T-SQL task in a maintenance plan should install the GDR. Others may skip the GDR.

posted by Greg_Linwood | 0 Comments
Filed Under:

Chiune Sugihara

I just watched the PBS documentary "Sugihara: Conspiracy of Kindness" for the second time in a few months on TV tonight.

Chiune Sugihara saved thousands of Jewish lives during WWII whilst serving as Japanese consul in Lithuania by issuing transit visas against instructions from his home office in Japan. After returning to Japan at the completion of the war & losing his diplomatic post, he lived in obscurity in Russia for many years & refused all assistance from those he had helped. Many years later, one of those he saved waited 4 years to meet him to ask why he took such tremendous risks to help those he didn't know. He said:

"In life, do what's right because it's right, then leave it alone".

An amazing guy..

posted by Greg_Linwood | 0 Comments
Filed Under:

Congratulations Peter Ward!

Peter, I hear you've been given Microsoft's MVP award for your work with SQL Server MVP - congratulations! I believe this now makes up 6 Aussie SQL MVPs.. (c:

Cheers,
Greg

Design better indexes rather than re-write your queries

I seem to be having the same conversation over & over again recently with developers who believe that re-writing queries is the best way to improve performance. Maybe I'm just obsessing about this topic a little at the moment, but I think the problem really does run much deeper.

Whilst it's true that re-writing some queries can help performance, it's also true to say that it shouldn't. Any two properly expressed & equivalent queries should theoretically be "optimised" by the query optimiser to the same execution plan. Changing the way a query is expressed really shouldn't result in any change of execution plan (all other factors remaining equal).

The reason why re-writing some queries does actually improve performance is that the query optimiser isn't perfect & you can take advantage of knowledge of optimiser limitations when re-writing a query. For example, take the case of moving logic from a User Defined Function (UDF) into a set based expression - a very common performance tuning query re-write technique. This improves performance because the query optimiser lacks the smarts to do this automatically for itself (it runs each invocation of a UDF - for every row - the same way it runs a stored procedure). It's certainly possible for modularised procedure logic to be folded into the calling set based statement (as evidenced by the fact that you can do this yourself) - the TSQL optimiser's just not quite that smart.

Whilst it might seem that I'm contradicting my argument at this point (I'm showing that re-writing queries can help even though the topic of this post is that this isn't the best approach) the main point I'd like to make is that there are limited cases where re-writing queries can help performance & ultimately you need to ensure that you have the right indexes in place before spending any time re-writing queries. You'll generally get far better results by focussing on designing the best indexes possible before looking at re-writing queries, although ultimately you do need to turn to re-writing queries at times to deal with optimiser limitations..

Consider the case where you might have a large table with no indexes. No matter how you express a SQL SELECT statement, the only option available to the query optimiser is to scan that table (as there are no indexes available for more efficient access paths). Sure, this might seem obvious to some, but given the number of times I've had this discussion lately, I suspect it's not so obvious to everyone. Somehow, some developers seem to focus immediately on re-writing the query before checking the best indexes are available first.

SQL Server can only optimise access paths to data stored in tables using indexes which already exist. Your first concern with tuning queries should always be to ensure that your tables have indexes which at least support the filter criteria (WHERE clause, INNER joins) of your queries & then possibly also covering selected if many rows are involved. Without good indexes, it doesn't matter how you express your query, SQL Server will have no choice but to fully scan your underlying tables.

I believe there are reasons why developers tend to think about re-writing queries before thinking about indexes & that this lies somewhere in the fact that SQL contains no processing rules. Develoeprs tend to have a hard enough time getting their queries right correct let alone thinking about indexes, but this is a topic for another day.

Why the Relational Model is Insufficient (part 1)

Paul Nielsen has pointed out a couple of interesting deficiencies in the relational model on his blog:

http://sqlblog.com/blogs/paul_nielsen/archive/2006/12/18/430.aspx

More Posts Next page »