Deleting lots of data in batches

The fun part is in the “where…” bit, knowing how to set your query to only get a portion of the data is heavy lifting here.
I know where I’m going to be using this in some soon to be written code. For the code, the data will timestamped, I can safely iterate by day and nuke all of the records for each day.

So we’ve all come across the need to delete 10 million records. however we all no that this won’t be quick and will result in a large log file and as we get nearer deleting the 10 millionth row the process is going very slowly.

Well the standard way around this is to run the command in batches, this way our transaction is never very big. So you can write a while loop and check an iterator, but first you need to get into the loop so you need to store the iteration in a variable and have something like this

set rowcount 10000
declare @rc int
set @rc =0
while @rc < 1000 
  begin 
  –Do my update/delete etc    
  delete from mytable where ….
  set @rc = @rc+1
  end

Well in SQL 2005 in SQLCMD mode and the new TOP clause in an update/delete you can do the following

–your update statement
delete top (10000) from mytable where ….
:go 1000

which of these looks easier to you. I vote for number 2.

[via WebLogs @ SqlJunkies.com]

Regular Expression Tools

I’m still at the point where I can use stuff like ^\d{3}[-| ]?\d{2}[-| ]?\d{4}$ and not fully understand the pieces. So i’m always looking for decent RegEx tools. Eric Gunnerson had a good post about suing RegEx to validate a SSN

But first, a word about tools. It’s a lot easier to use a tool to do this sort of thing than it is to write code to do it. So, I suggest one of the following

RE: Another Blast of Cold Water in the face – "The Build Master" by Vincent Maraia

Bill lists a couple of good references for doing the build the right way. I want these two books.

Eric Garulay, one of the uber cool folks over at Addison-Wesley shot me a copy of The Build Master Microsoft’s Software Configuration Management Best Practices by Vincent Maraia ( Build Master Web Site ) . Well, looks like Kim is going to appropriate the book from me so I need to hurry up and get it read, fortunately that’s not a problem because it’s pretty amazing.  Let me step back a second.  A while ago, my buddies at http://www.apress.com sent me a book titled Expert .NET Delivery using NAnt and CruiseControl .NET  and it was a real eye opener.  Until then, my build strategy was about as sophisticated as “Get Latest Version” from Visual Source Safe, flipping the Release bit and compiling. 

For rest of the posting, go to [Bill’s House O Insomnia]

Internet Health Report

If you think that the Internets are running slow, click this link. What you will see is a chart of the Internet backbone providers and the relative speed that they are talking to each other.

RE: SqlDependency changes for RTM [Sushil Chordia]

I want to play with this feature. I have implemented something with similiar functionality using UDP in an extended stored procedure, but this looks much simpler.

As mentioned in my previous blog, SqlDependency is a new feature in .Net framework 2.0, which provide a mechanism to notify an app when a cache is invalidated. We got enough feedback from customers in Beta 2 with regards ease of deployment (some issues here) and security that we decided to make some changes for the final release. These new changes are now available as part of the September CTP. Following is a quick example on how to get Notification working on the September CTP bits. (Things new in September CTP are marked in RED)

using System;
using System.Data;
using System.Data.SqlClient;
class QuikExe
{
  public static string connectionstring = “Get Connection String From The Config File”;
  public void DoDependency()
  {
    using (SqlConnection conn = new SqlConnection(connectionstring))
    {
      conn.Open();
      Console.WriteLine(“Connection Opened…”);

      SqlCommand cmd = conn.CreateCommand();
      cmd.CommandText = “Select i from dbo.test”;

      //Notification specific code
      SqlDependency dep = new SqlDependency(cmd);
      dep.OnChange += delegate(Object o, SqlNotificationEventArgs args)
      {
        Console.WriteLine(“Event Recd”);
        Console.WriteLine(“Info:” + args.Info);
        Console.WriteLine(“Source:” + args.Source);
        Console.WriteLine(“Type:” + args.Type);
      };

      SqlDataReader r = cmd.ExecuteReader();
      //Read the data here and close the reader
      r.Close();
      Console.WriteLine(“DataReader Read…”);
    }
  }

  public static void Main()
  {
    try
    {
      //Start the listener infrastructure on the client
      SqlDependency.Start(connectionstring);
      QuikExe q = new QuikExe();
      q.DoDependency();
      Console.WriteLine(“Wait for Notification Event…”);
      Console.Read();
    }
    finally
    {
      //Optional step to clean up dependency else it will fallback to automatic cleanup
      SqlDependency.Stop(connectionstring);
    }
  }
}

read the rest here…

Just say no to CLR UDTs

Alex Papadimoulis is pretty adamant about not using CLR UDTs in SQL Server 2005.

No one has asked me that question just yet [“When Should I Use SQL-Server CLR User Definied Types (UDT)?”], but with the release of SQL Server 2005 just around the corner, I’m sure a handful of people will. Unlike regular User Defined Types, CLR UDTs are a new feature of SQL Server 2005 that allows one to create a .NET class and use it as a column datatype. As long as a few requirements are followed, one can create any class with any number of properties and methods and use that class as a CLR UDT.

Generally, when a new feature is introduced with a product, it can be a bit of a challenge to know when and how to use that feature. Fortunately, with SQL Server’s CLR UDTs, knowing when to use them is pretty clear:

Never.

Let me repeat that. Never. You should never use SQL Server CLR User Defined Types. I’m pretty sure that this answer will just lead to more questions, so allow me to answer a few follow-up questions I’d anticipate.

The full article can be read here. Another reason to take a pass on CLR UDTs is that it ties your database to SQL Server 2005. If you are doing an app that can run on SQL Server 2000, you just shot yourself in the foot.

Another reason to avoid CLR UDTs? Eliminating error messages like “File or assembly name udtname, Version=1.2.0.0, Culture=neutral, PublicKeyToken=389619d4c1235f8a, or one of its dependencies, was not found.”

Spam weasels

I finally got a spam comment. In fact, it was from another Blogger user. That’s in violation of the Blogger TOS, so forwarded his infomation to the Blogger people (Google). I gave Google about a week to take whatever action they wanted to take against that user and then I removed that comment. It was odd that way the spam appeared. I posted a new entry and within 2 minutes, it got a comment. This is by no means a high traffic site, it’s mainly to keep track of things.

Nine reasons not to use serialization

There’s a good article on The Code Project that explains why you shouldn’t use serialization to store data. The root problem is that the information that gets serialized out is strongly typed. In other words, whatever wrote that data out, better be the same thing that read it back in again. Should your code change it’s data structures, trying to read in serialized data from a previous version will break the code. That kinda defeats the purpose of using XML to store data. And not in a good way.

RE:Oldest noodles unearthed in China

Paleolithic dorm food discovered:

The remains of the world’s oldest noodles have been unearthed in China. The 50cm-long, yellow strands were found in a pot that had probably been buried during a catastrophic flood. Radiocarbon dating of the material taken from the Lajia archaeological site on the Yellow River indicates the food was about 4,000 years old. That date is about 1000 years older than what had been considered the oldest known instance of noodles.

[Via The BBC]