Wednesday, December 29, 2010

Use (UPDLOCK, READPAST) to read a SQL Server data queue

The problem
A useful business scenario involves reading data from a "queue", which would be a table consisting of rows generated by one process, but read by a separate process that runs independently.  The second process (queue reader) may be running with multiple threads, or having multiple instances (such as a multi-user desktop or web application).  This can often lead to unexpected results, such as duplicate data retrieval, deadlocks, or other concurrency issues.  The SQL Server stored procedure called by such an application may involve a set of steps similar to:
  1. SELECT the ID of a specific row (meeting some criteria) into a temporary variable
  2. UPDATE a flag on that row to mark it as "processed"
  3. SELECT the data from that row along with rows from any related tables
Enclosing the above logic inside a BEGIN TRANSACTION - COMMIT TRANSACTION segment does not prevent any parallel processes from reading the same ID in step-1 above. This is because while the transaction prevents both processes from trying to update the same row, but the SELECT in step-1 of one transaction does not block the other transaction from also doing the same SELECT.

The solution
An efficient approach to solving this is to use the WITH (UPDLOCK, READPAST) table hint in the SELECT statement. UPDLOCK specifies that update locks are to be taken and held until the transaction completes, and READPAST specifies that SQL Server not read rows (i.e. skip them) that are locked by other transactions.

The combined effect of these table hints is that the first transaction to get to the SELECT statement establishes a lock on the row (or page), and the second transaction simply skips the locked rows and reads the next available row.  Without the READPAST, the second transation would get blocked by the first one until the locks are released.

Here's a sample code segment to use these table hints in a stored procedure:


SELECT TOP 1 @rowid = a.RowID
ORDER BY a.ProcessedOnUtc, a.RowID

UPDATE TestTable SET ProcessedOnUtc = GETUTCDATE()
WHERE RowID = @rowid

-- (rest of the data retrieval and processing statements)

A very important Gotcha!
To enable this to work correctly, there must be an index on the column(s) being used to identify the selected row(s), in this case the RowID column.  Without that index, SQL Server will need to look through the table to find the correct row for the second transaction, which will then get blocked since it won't be able to get past the row that's locked by the first transaction.  The result in such a case will be that duplicates will be avoided but deadlocks will still occur - the same as using UPDLOCK without READPAST.

Monday, December 27, 2010

Setting up a software RAID-5 on a Dell T310 server

Got the opportunity to wet my feet on setting up a software-based RAID-5 on a Dell PowerEdge T310 server, within Windows Server 2008. Quite an experience!

The hardware
The system was originally equipped with a 250GB drive. It contains the OS and we left it alone. Installed 3 additional, identical, drives: 1TB Western Digital Caviar Black. These are 7200 RPM drives with 32 MB buffers. The question was whether to go for the RE4 drives (also from WD), but those are about twice as costly, and this is a medium load dev & test server, so it was decided to stick with the Blacks and save some Green.

Drive installation
This was somewhat unexpectedly tricky. The physical installation was a breeze of course - just pull out the drive trays, set the drives into them, push them back in and snap in the connectors. But, the system refused to boot after this - LED indicators 1 and 4 lit up and that was it; the system did not even get to the BIOS setup screen.

After some initial irritation ("a drive must be DOA!"), we tried out the drives in a different machine (desktop, not server); all worked perfectly. Good. Then, a bit of research into the owner's manual for the T310 showed that there's a jumper on the motherboard that clears the stored configuration when the system is next restarted. Set the jumper accordingly, plugged-in and restarted the system, and... no luck! What! Restored the jumper, tried again, same results. Again toyed with the jumper; nope. I think the fourth time around I saw what initially looked like an unbelievable dream - various LED lights flashing back and forth, the disk light flashing, and generally speaking a sight very different from the steady 1 and 4 indicators! Scrambled to press F2 to get into the setup screen. The drives showed up fine.

There must be something that I missed here. Drive installation would not be such a hit-or-miss approach. Besides, this would tend to disprove the theory that trying the same thing repeatedly and expecting success is madness.

Setting up the RAID-5 in Windows Server 2008
Once Windows booted up, went into Start > Computer Management, and clicked on Disk Management (under the Storage subsection). The 3 additional drives showed up fine, although unformatted and unavailable for use at this point. In Windows Explorer, they are not showing up yet.

Right-clicked on one of the new drives in the Disk Management UI screen, and was immediately presented with an option to set up a RAID-5. Selected that option led to further screens where the remaining 2 drives were added into the RAID, and that was about it. Once the new drive name and letter were specified, the system started to sync the 3 drives together, to form a 2TB volume using the 3x1TB drives.

Note that the system can set up the drives as RAID-0, RAID-1 or RAID-5; available options will depend on how many drives are available (with just 2 drives, the RAID-5 option will be grayed-out).

Another important note when setting up a RAID-5: you cannot have data on a drive, and then add 2 more drives and try to get them all into a RAID-5 configuration (though you can mirror the original drive). All the 3 drives must be empty when you start.

Once we gave the go-ahead to start the sync, the system takes a long, long time! This is the progress after about 7 hours:

Interestingly, while the sync is going on, I can work with the drives - they show up as a single 2TB volume in Windows Explorer (E: in this case). I tried creating a simple Notepad document and it seems to be fine. However, I'm going to wait for the entire sync to complete before loading any real data on the volume.

Once done, this will give me 2TB of space, and a little peace of mind against disk crashes.