Fundamentals of Storage Systems, IO Latency and SQL Server
Posted by Wes Brown
A Thousand Men Marching Still Only March As Fast As One Man.
la·ten·cy – Computers . the time required to locate the first bit or character in a storage location, expressed as access timeminus word time.
Often when talking to people about performance they get rapped around the MB/Sec number and ignore a critical factor, latency. SQL Server is one of those few applications that is very sensitive to disk and network latency. Latency is what the end user sees. If your SQL Server is waiting around for disk or network they will start to complain. In an OLTP environment SQL Server accesses data on disk in a nonlinear fashion, aka random IO’s. The size of these IO request can be pretty small. In a good application you really try and limit the amount of data returned to keep things speedy. The down side of small Random IO’s is the system will never be faster than a single seek operation on your disk. So, if you have a 15k SAS drive that is around 2.5ms. Caching and buffering schemes aside for now, 2.5ms is your floor. You will never be faster than that. Depending on the size of the IO request you spend more time waiting for the seek operation than you do actually transferring the data from the disk. We group disks together in larger arrays to give us more aggregate throughout and higher operations per second, but you are ever only as fast as your slowest disk. You can always get more aggregate throughput up to several gigabytes a second but you still have to wait on that first bit to travel to you.
To get around these short comings, SQL Server buffers as much data as it can in memory. We also add large amounts of cache to our SAN’s and disk controllers. This can mask some of the problem, but at some point it still needs to come from your disk drives.
On the network side things are actually better. With latency under a millisecond on a LAN you usually waiting on disk to deliver the data. There are other factors like the speed of the network equipment and number of hops across interfaces you have to make can be more significant than the actual transmittion rate. TCP/IP can be a factor as well. Out of the box SQL Server is configured at 4KB(4096 byte) packet. This is a good general setting for most workloads. If you are working on a highly tuned OLTP system you may want to set that to something smaller or align it with the TCP packet size for your network, usually 1500 bytes. If it is a OLAP system with lots of streaming throughput latency will make up a very small part of the overall transmission time and going with a larger packet size possibly aligned to the 8KB page size with increase throughput and decrease the time to transmit overall. If you do set a large packet size you should consider enabling jumbo frames on your network card. Make sure that your network equipment can support the jumbo frame from end to end.
Another place where we are starting to see more latency issues is with database mirroring. In synchronous mode, the default setting, you are now adding network latency plus the disk latency of the other server to the overall transaction time.
Mirroring isn’t the only game in town. We have had SAN level replication for quite a while as well. In most of the scenarios where we were using SAN level replication it was site to site across several miles. To give you an idea of how much latency can be added in this situation go ping yahoo.com or google.com, I’ll wait….. Ok, from my location ether of them is 45ms~75ms, or 18 times slower than your spinning disks. All the sudden, your network is the major player in delaying transactions. I’ve used fibre optics to connect to sites and the latency can still be a killer for OLTP systems. The general rule of thumb is 7.5 microseconds for every 1 1/2 miles. If our target SAN is 125 miles away we just added 2ms of latency to the 4ms of latency the two sets of disks are providing. In reality, it is worse than that when you again figure in network equipment. I personally haven’t seen synchronous setups more than 50 miles apart.
Just something to keep in mind as you plan your SQL Server infrastructure. Latency in its myriad forms is king.