One of my services logs every request to a private log table. That table mainly a diagnostic tool to provide some crude performance benchmarks. It’s not designed for historical trending, so I have code to purge older records. The service would periodically (twice a day) issue a DELETE statement to the database server to delete records older than 30 days. Given the following schema (sample, not the actual schema):
<span>CREATE TABLE </span><span>[MyLog]</span><span>(<br /> </span><span>[RecordID] [int] </span><span>IDENTITY</span><span>(</span><span>1</span><span>,</span><span>1</span><span>) NOT NULL,<br /> </span><span>[LogTimeStamp] [datetime] </span><span>NOT NULL,<br /> </span><span>[Duration] [decimal]</span><span>(</span><span>12</span><span>, </span><span>4</span><span>) NOT NULL</span><span>,<br /> </span><span>[SessionID] [varchar]</span><span>(</span><span>40</span><span>) NOT NULL,<br /> </span><span>[IP] [varchar]</span><span>(</span><span>24</span><span>) NOT NULL,<br /> </span><span>[Request] [varchar]</span><span>(</span><span>80</span><span>) NULL,<br /> </span><span>[Response] [varchar]</span><span>(</span><span>80</span><span>) NULL,<br /> </span><span>[Error] [varchar]</span><span>(</span><span>255</span><span>) NULL,<br /> </span><span>[Description] [varchar]</span><span>(</span><span>80</span><span>) NULL,<br /> </span><span>CONSTRAINT </span><span>[PK_MyLog] </span><span>PRIMARY KEY CLUSTERED <br /></span><span>(<br /> </span><span>[RecordID] </span><span>ASC<br /></span><span>) </span><span>ON </span><span>[PRIMARY]<br /></span><span>) </span><span>ON </span><span>[PRIMARY]<br />GO<br /></span><span>CREATE NONCLUSTERED INDEX </span><span>[SK_MyLog_LogTimeStamp] </span><span>ON </span><span>MyLog<br /></span><span>(<br /> </span><span>[LogTimeStamp] </span><span>ASC<br /></span><span>) </span><span>ON </span><span>[PRIMARY]</span>
I would execute the following SQL statement;
<span><br />DELETE </span><span>MyLog </span><span>WHERE </span><span>DATEDIFF</span><span>(</span><span>DAY</span><span>, </span><span>LogTimeStamp</span><span>, </span><span>GETDATE</span><span>()) > </span><span>30</span>
It’s pretty simple, use the DateDiff() function to compare the timestamp field with the current date and if it’s older than 30 days, delete that record. I implemented that code in the first go around of the code, about two years ago. This week, I was in that area code for some maintenance and I took another look at that statement. That WHERE clause jumped right out at me. For every row in that table, both the DateDiff() and GetDate() functions are going to be called. SQL Server will need to compare every value of LogTimeStamp to see if it is older than 30 days ago. In this case, MyLog has an index on LogTimeStamp, but it will has to read the entire index. GetDate() is a nondeterministic function, it’s going to get re-evaluated for each row in the database. Since the actual date comparison is against a constant value, I decided to evaluate the comparision date first and change the WHERE clause to a simpler expression.
<span>DECLARE </span><span>@PurgeDate </span><span>smalldatetime<br /></span><span>SELECT </span><span>@PurgeDate </span><span>= </span><span>DATEADD</span><span>(</span><span>DAY</span><span>, -</span><span>30</span><span>, </span><span>GETDATE</span><span>())<br /></span><span>DELETE </span><span>MyLog </span><span>WHERE </span><span>LogTimeStamp </span><span>< </span><span>@PurgeDate</span>
I added a smalldatetime variable and assigned to it date of 30 days ago with the DateAdd() and GetDate() functions. Now SQL Server can use the value of @PurgeDate to jump into the index and jump out when the date condition no longer matches the criteria. By I implemented this on SQL Server 2005 and when I evaluated the estimated execution plans for each delete statement, I was surprised to see identical plans. Both sets of statements spent the same percentage of time doing scanning and deleting.
When I did the same evaluation on SQL Server 2000, I saw different results. The first delete statement spent 73% of the time scanning the index and 27% actually deleting rows from the table. The second delete statement spent 19% of the time scanning and 81% of the time deleting rows. On table that could have a large number of rows, it turned out to be big performance saving on SQL Server 2000 installations.
It’s pretty cool that the SQL Server 2005 parser is smart enough to optimize code and recognize a constant expression when it sees it. My code would have seen a nice little performance boost by moving from SQL Server 2000 to SQL Server 2005. It’s still a better thing to pull constant expressions out of the WHERE clause when you can do that.
SQL formatting courtesy of The Simple-Talk SQL Prettifier for SQL Server.