Query Plan Guides - part 1 - tame those wild vendor apps!
One of SQL Server 2005's most exciting new performance tuning features is Query Plan Guides - a technology that allows DBAs to control the execution plans of queries which are submitted "dynamically" as SQL statements by black box vendor applications, even where the DBA has no ability to add query hints or otherwise influence the SQL code being submitted.
Thie feature solves an age-old problem for DBAs who manage 3rd party vendor applications or even internal applications where code is tightly managed, to the point where query hints can not be easily added to stored procedure code. Previously, DBAs might have had to negotiate with vendors or application developers simply to add a query hint to a poorly performing query. All too often, those application developers are either too busy or perhaps even unwilling to make such small changes, even to achieve important performance objectives.
Query Plan Guides set the DBA free to alter the execution plans chosen by the database engine, without altering any application code at all.
I have recently been researching Query Plan Guides, both for the purposes of preparing a recent User Group presentation and also for use in some real-life scenarios. I'll be posting thoughts, lessons & experiences in a series of blog posts over coming weeks and am very interested in feedback from others on the effectiveness of Query Plan Guides "in the wild"
Before I get too far into how Query Plan Guides work though, I'd first like to cover some background query processing fundamentals which are pre-requesites to understanding & appreciating the power available within the new Query Plan Guide feature..
The first area of enlightenment required to understand Query Plan Guides is to realise that SQL Server has to parse, compile & optimise every SQL statment submitted by an application, unless that statement is simply the execution of a pre-compiled stored procedure or pre-parameterized statement via sp_execute_sql. All too often, developers design their applications by simply constructing SQL "strings" in their applications without considering the important fact that, to SQL Server, these strings are meaningless until they're parsed & compiled into a structure that SQL Server can "bind" to it's internal meta-data on table / view objects etc.
The cost of this parse & compile process varies substantially, based on how complex the statements being submitted are & how repetitively the statements are being submitted to SQL Server. Imagine the scenario where large & complex SQL statements are submitted to SQL Server many times per second - SQL Server needs to parse, compile & optimise every instance of these statements, which can be quite a CPU intensive process.
SQL Server 2000 (& earliier versions) had some smarts to try & limit the impact of this overhead. For example, SQL Server would cache a copy of the parsed & compiled query Execution Plan in a memory pool known as the Procedure Cache. If the same query was re-submitted by the application later, SQL Server would simply do a lookup in this cache prior to parsing. If an IDENTICAL statement was found in the Procedure Cache, SQL Server would simply use that execution plan. The downside to this approach was that SQL statements were stored as their string literal values, fully qualified with embedded parameters. For example, where SQL Server received the following statement:
select * from customers where cust_id = 1
SQL Server would cache the full string literal representation of the SQL statement, including the embedded parameter "1". If precisely the same statement was re-submitted to SQL Server by the application (including "cust_id = 1" again), the pre-existing Execution Plan would be used, saving SQL Server the effort of parsing, compiling & optimising the query. But any slight change in the SQL statement would require an entirely new plan, eg:
select * from customers where cust_id = 2
In this case, SQL Server would look up its Procedure Cache, fail to find a match, due to the SQL statement string literal values not being "quite" the same. SQL Server would then go through the whole process of parsing, compiling & optimising the new query.
There are several detrimental effects from this - firstly, SQL Server considers the cost of parsing, compiling & optimising queries to be so expensive that it tries its best to store as many Execution Plans in its Procedure Cache as possible in an attempt to minimise possible future work. The more memory SQL Server allocates to this purpose, the less it has for data caching (keeping table rows & columns in memory to reduce disk access). This behaviour can lead to severe performance degradation if the Procedure Cache "bloats", stealing memory from the more important data caching role. When this happens, SQL Server has less memory available to cache data & consequently uses its disks more heavily - a high price to pay for lazy development practises.
How can Query Plan Guides help?
Query Plan Guides offer DBAs the opportunity to alter SQL Server's behaviour, such that it no longer caches string literal representations of SQL statements in its procedure cache, but caches "parameterized" version of them instead. This can lead to significant reduction in Procedure Cache memory utilisation, returning memory to more important data caching. It can also lead to some ill-effects, which I'll explore over up-coming blog posts & hopefully engage in some discussion with you or others who are exploring this technology along with me.
I should also point out at this stage that SQL Server 2005 also has a database level "parameterization" configuration, which can be set to "forced" to coerce SQL Server into foreably parameterizing every single SQL statement submitted to the database, but this might sometimes be overkill & Query Plan Guides offer an alternative, to simply target a specific problem query, without altering the behaviour of all other queries in the database (which would create other uncertainties). I previously discussed FORCED PARAMETERIZATION here:
http://blogs.sqlserver.org.au/blogs/greg_linwood/archive/2006/03/22/61.aspx
I should also point out that SQL 2000 had an "auto-parameterization" feature, but I won't elaborate on this feature as it was overly simplistic, "auto-parameterizing" only the simplest of queries. Whenever multiple tables were joined, subqueries, aggregates or virtually any other semi-advanced SQL feature used, auto-parameterization was simply ignored, severly limiting the effectiveness of that feature in SQL 2000. The example I provided earlier (select * from customer where cust_id = 1) would certainly have been auto-parameterized by SQL Server 2000 - I was just trying to keep the example as simple as possible there..
In my next blog post, I'll discuss the various types of Query Plan Guides & introduce a first example, leading to further discussion on the cost / benefit trade off from a performance perspective.