Fundamentals of Storage Systems, IO Latency and SQL Server

A Thousand Men Marching Still Only March As Fast As One Man.

la·ten·cy – Computers . the time required to locate the first bit or character in a storage location, expressed as access timeminus word time.

Often when talking to people about performance they get rapped around the MB/Sec number and ignore a critical factor, latency. SQL Server is one of those few applications that is very sensitive to disk and network latency. Latency is what the end user sees. If your SQL Server is waiting around for disk or network they will start to complain. In an OLTP environment SQL Server accesses data on disk in a nonlinear fashion, aka random IO’s. The size of these IO request can be pretty small. In a good application you really try and limit the amount of data returned to keep things speedy. The down side of small Random IO’s is the system will never be faster than a single seek operation on your disk. So, if you have a 15k SAS drive that is around 2.5ms. Caching and buffering schemes aside for now, 2.5ms is your floor. You will never be faster than that. Depending on the size of the IO request you spend more time waiting for the seek operation than you do actually transferring the data from the disk. We group disks together in larger arrays to give us more aggregate throughout and higher operations per second, but you are ever only as fast as your slowest disk. You can always get more aggregate throughput up to several gigabytes a second but you still have to wait on that first bit to travel to you.

To get around these short comings, SQL Server buffers as much data as it can in memory. We also add large amounts of cache to our SAN’s and disk controllers. This can mask some of the problem, but at some point it still needs to come from your disk drives.

On the network side things are actually better. With latency under a millisecond on a LAN you usually waiting on disk to deliver the data. There are other factors like the speed of the network equipment and number of hops across interfaces you have to make can be more significant than the actual transmittion rate. TCP/IP can be a factor as well. Out of the box SQL Server is configured at 4KB(4096 byte) packet. This is a good general setting for most workloads. If you are working on a highly tuned OLTP system you may want to set that to something smaller or align it with the TCP packet size for your network, usually 1500 bytes. If it is a OLAP system with lots of streaming throughput latency will make up a very small part of the overall transmission time and going with a larger packet size possibly aligned to the 8KB page size with increase throughput and decrease the time to transmit overall. If you do set a large packet size you should consider enabling jumbo frames on your network card. Make sure that your network equipment can support the jumbo frame from end to end.

Another place where we are starting to see more latency issues is with database mirroring. In synchronous mode, the default setting, you are now adding network latency plus the disk latency of the other server to the overall transaction time.

Mirroring isn’t the only game in town. We have had SAN level replication for quite a while as well. In most of the scenarios where we were using SAN level replication it was site to site across several miles. To give you an idea of how much latency can be added in this situation go ping or, I’ll wait….. Ok, from my location ether of them is 45ms~75ms, or 18 times slower than your spinning disks. All the sudden, your network is the major player in delaying transactions. I’ve used fibre optics to connect to sites and the latency can still be a killer for OLTP systems. The general rule of thumb is 7.5 microseconds for every  1 1/2 miles. If our target SAN is 125 miles away we just added 2ms of latency to the 4ms of latency the two sets of disks are providing. In reality, it is worse than that when you again figure in network equipment. I personally haven’t seen synchronous setups more than 50 miles apart.

Just something to keep in mind as you plan your SQL Server infrastructure. Latency in its myriad forms is king.

Series To Date:
  1. Introduction
  2. The Basics of Spinning Disks
  3. The System Bus
  4. Disk Controllers, Host Bus Adapters and Interfaces
  5. RAID, An Introduction
  6. RAID and Hard Disk Reliability, Under The Covers
  7. Stripe Size, Block Size, and IO Patterns
  8. Capturing IO Patterns
  9. Testing IO Systems
  10. Latency – You are here!

Posted on February 23, 2011, in Fundamentals, IO, SQLServerPedia Syndication, Storage Systems, Syndicated and tagged . Bookmark the permalink. 5 Comments.

  1. So true, been banging this drum since i was a early softie and still not many folks appreciate this. Nice blog.

  2. It can be a hard concept to get across to people sometimes. Glad you like it!

  3. Just got around to reading your article. I bookmarked it when you published it on twitter. I’m also glad to see that this was a piece in a series. I’ll have to read those too.

    It’s been my experience that as a DBA you almost have to know about all aspects of your infrastructure. I had a storage admin tell me that it can’t be the SAN because it been setup correctly. The problem was that it was set up correctly for reliability and not necessarily performance. Once I showed him the stats he shut up pretty quickly.

  1. Pingback: Log Buffer #214, A Carnival of the Vanities for DBAs | The Pythian Blog

  2. Pingback: The Fundamentals of Storage Systems – Introduction | SQL Server Input/Output

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: