Monthly Archives: February 2011
"It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so." - Mark Twain
I try and keep this quote in my mind whenever I’m teaching about new technologies. You often hear the same things parroted over and over again long after they quit being true. This problem is compounded by fast moving technologies like NAND Flash.
If you have read my previous posts about Flash memory you are already aware of NAND flash endurance and reliability. Just like CPU’s manufacturing processes flash receive boost in capacity as you decrease the size of the transistors/gates used on the device. In CPU’s you get increases in speed, on flash you get increases in size. The current generation of flash manufactured on a 32nm process. This nets four gigabytes per die. Die size isn’t the same as chip, or package size. Flash dies are actually stacked in the actual chip package giving us sixteen gigabytes per package. With the new die shrink to 25nm we double the size to eight gigabytes and thirty two gigabytes respectively. That sounds great, but there is a dark side to the ever shrinking die. As the size of the gate gets smaller it becomes more unreliable and has less endurance than the previous generation. MLC flash suffers the brunt of this but SLC isn’t completely immune.
Cycles And Errors
One of the things that always comes up when talking about flash is the fact it wears out over time. The numbers that always get bantered about are SLC is good for 100,000 writes to a single cell and MLC dies at 10,000 cycles. This is one of those things that just ain’t so any more. Right now the current MLC main stream flash based on the 32nm process write cycles are down to 5000 or so. 25nm cuts that even further to 3000 with higher error rates to boot.
Several manufactures has announced the transition to 25nm on their desktop drives. Intel and OCZ being two of the biggest. Intel is a partner with Micron. They are directly responsible for developing and manufacturing quite a bit of the NAND flash on the market. OCZ is a very large consumer of that product. So, what do you do to offset the issues with 25nm? Well, the same thing you did to offset that problem with 32nm, more spare area and more ECC. At 32nm it wasn’t unusual to see 24 bits of ECC per 512 bytes. Now, I’ve seen numbers as high as 55 bits per 512 bytes to give 25nm the same protection.
To give you an example here is OCZ’s lineup with raw and usable space listed.
|Drive Model||Production Process||Raw Capacity (in GB)||Affected Capacity (in GB)|
As you can clearly see the usable space is significantly decreased. There is a second problem specific to the OCZ drives as well. Since they are now using higher density modules they are only using half as many of them. Since most SSD’s get their performance from multiple read/write channels cutting that in half isn’t a good thing.
SLC is less susceptible to this issue but it is happening. At 32nm SLC was still in the 80,000 to 100,000 range for write cycles but the error rate was getting higher. At 25nm that trend continues and we are starting to see some of the same techniques used in MLC coming to SLC as ECC creeps up from 1 bit per 512 bytes to 8 bits or more per 512 bytes. Of course the down side to SLC is it is half the capacity of MLC. As die shrinks get smaller SLC may be the only viable option in the enterprise space.
It’s Non-Volatile… Mostly
Another side effect of shrinking the floating gate size is the loss of charge due to voltage bleed off over time. When I say “over time” I’m talking weeks or months and not years or decades anymore. The data on these smaller and smaller chips will have to be refreshed every few weeks. We aren’t seeing this severe an issue at the 25nm level but it will be coming unless they figure out a way to change the floating gate to prevent it.
Smaller Faster Cheaper
If you look at trends in memory and CPU you see that every generation the die gets smaller, capacity or speed increases and they become cheaper as you can fit double the chips on a single wafer. There are always technical issues to overcome with every technology. But NAND flash is the only one that gets so inherently so unreliable at smaller and smaller die sizes. So, does this mean the end of flash? In the short term I don’t think so. The fact is we will have to come up with new ways to reduce writes and add new kinds of protection and more advanced ECC. On the pricing front we are still in a position where demand is outstripping supply. That may change somewhat as 25nm manufacturing ramps up and more factories come online but as of today, I wouldn’t expect a huge drop in price for flash in the near future. If it was just a case of SSD’s consuming the supply of flash it would be a different matter. The fact is your cell phone, tablet and every other small portable device uses the exact same flash chips. Guess who is shipping more, SSDs or iPhones?
So, What Do I Do?
The easiest thing you can do is read the label. Check what manufacturing process the SSD is using. In some cases like OCZ that wasn’t a straight forward proposition. In most cases though the manufacturer prints raw and formatted capacities on the label. Check the life cycle/warranty of the drive. Is it rated for 50 gigabytes of writes or 5 terabytes of writes a day? Does it have a year warranty or 5 years? These are indicators of how long the manufacturer expects the drive to last. Check the error rate! Usually the error rate will be expressed in unrecoverable write or read errors per bit. Modern hard drives are in the 10^15 ~ 10^17 range. Some enterprise SSDs are in the 10^30 range. This tells me they are doing more ECC than the flash manufacturer “recommends” to keep your data as safe as possible.
It’s that time again, T-SQL Tuesday is here! This time Pat Wright (blog|twitter) is hosting and has put forth automating tasks using ether T-SQL or Powershell. I LOVE automating stuff. As a production DBA in some very large shops you can’t do your job unless you make your servers work for you. I’ve been using T-SQL and *GASP* xp_cmdshell, OSQL and file shares to gather stats and push configurations to servers for decades. Log before fancy things like C# and Powershell existed. These days I use a variety of home grown tools but doing things with just T-SQL can be just as powerful. I’m going to leverage this post to start a series on a pure T-SQL implementation of configuration management, data gathering and utility procedure deployment.
Where Is Your Management Database?
Every DBA should have two things, a utility, or management database on every server and a central repository where all the locally collected data in the management database is pulled back to. What surprises people when I talk about this methodology is I don’t always advocate using a licensed copy of SQL Server. By that I mean that SQL Express works just great as a central repository server. I usually put this on a virtual machine, Hyper-V or what ever flavor of virtual environment your company supports. This allows you to do things like enable CLR and xp_cmdshell on a server that technically is non-production and keep your security risk much lower. Every server that is deployed in my shop gets a local management database. From the central repository I can push out everything I need to manage a server and keep all my servers up to date as I make improvements or bug fixes to the management code. That’s all I’m really going to say about that in this post though I just wanted to give you an idea of just how deep the rabbit hole can go.
DMV’s give you the information, you have to use it.
Since SQL Sever 2005 Microsoft let of of the black box mentality and started providing crazy useful information via Dynamic Management Views. Virtual file statistics though have been around for quite a while. They got a touch up in the DMV but the basic information was available in SQL Server 2000 via function call. The DMV I’m after is sys.dm_io_virtual_file_stats. It has a ton of information in it. It’s main problem though is it is an aggregation over time and doesn’t really tell you what is different from yesterday to today. To get around that we have to build our own sampling routine.
The VirtualFileStats Table
We create a table in our local management database to collect the daily numbers. I try to keep things simple.
CREATE TABLE dbo.VirtualFileStats ( RecordID INT IDENTITY(1,1) NOT NULL, ServerName VARCHAR(255) NOT NULL, DBID INT NOT NULL, FileID INT NOT NULL, Reads BIGINT NULL, ReadsFromStart BIGINT NULL, Writes BIGINT NULL, WritesFromStart BIGINT NULL, BytesRead BIGINT NULL, BytesReadFromStart BIGINT NULL, BytesWritten BIGINT NULL, BytesWrittenFromStart BIGINT NULL, IostallInMilliseconds BIGINT NULL, IostallInMillisecondsFromStart BIGINT NULL, IostallReadsInMilliseconds BIGINT NULL, IostallReadsInMillisecondsFromStart BIGINT NULL, IostallWritesInMilliseconds BIGINT NULL, IostallWritesInMillisecondsFromStart BIGINT NULL, RecordedDateTime DATETIME NULL, IntervalInMilliseconds BIGINT NULL, FirstMeasureFromStart BIT NULL ) GO CREATE TABLE dbo.VirtualFileStatsHistory ( RecordID INT NOT NULL, ServerName VARCHAR(255) NOT NULL, DBID INT NOT NULL, FileID INT NOT NULL, Reads BIGINT NULL, ReadsFromStart BIGINT NULL, Writes BIGINT NULL, WritesFromStart BIGINT NULL, BytesRead BIGINT NULL, BytesReadFromStart BIGINT NULL, BytesWritten BIGINT NULL, BytesWrittenFromStart BIGINT NULL, IostallInMilliseconds BIGINT NULL, IostallInMillisecondsFromStart BIGINT NULL, IostallReadsInMilliseconds BIGINT NULL, IostallReadsInMillisecondsFromStart BIGINT NULL, IostallWritesInMilliseconds BIGINT NULL, IostallWritesInMillisecondsFromStart BIGINT NULL, RecordedDateTime DATETIME NULL, IntervalInMilliseconds BIGINT NULL, FirstMeasureFromStart BIT NULL )
This is what we need to gather, and later analyze the data. Since we are managing our samples we have to know when the sampling started and what the first sample is. FirstMeasureFromStart lets us know that it is the first base measurements the rest of the samples will delta off of.
GatherVirtualFileStats Stored Procedure
Next we need a stored procedure to do the sampling. One thing you will notice is the procedure executes continuously with a WAIT FOR DELAY built into it so you can get finer grained than the 1 minute limitation of the SQL Agent. Sometimes, I will do one off sampling for a short period, say 30 minutes at a 30 second interval but most often I just let it run and set the sample rate at 1 minute or larger depending on how busy the system is.
IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = Object_id(N'[dbo].[GatherVirtualFileStats]') AND Objectproperty(id, N'IsProcedure') = 1) DROP PROCEDURE [dbo].[GatherVirtualFileStats] GO -------------------------------------------------------------------------------------- -- GatherVirtualFileStats -- by: Wesley D. Brown -- date: 02/08/2011 -- mod: 04/14/2011 -- description: -- This stored procedure is used to sample sys.dm_io_virtual_file_stats to track -- performance at a database file level. This is useful for finding -- hotspots on SAN's or under performing IO systems. -- parameters: -- @Duration = '01:00:00' How long to run before exiting -- @IntervalInSeconds = 120 Number of seconds between samples --@DB = -1 DB_ID to monitor, -1 for all --@DBFile = -1 File_ID of file to monitor, -1 for all -- usage: -- DECLARE @RC INT, -- @StartTime DATETIME, -- @databaseID INT -- SELECT @StartTime = Getdate(), -- @databaseID = Db_id() -- EXEC @RC = Gathervirtualfilestats -- '00:45:30', -- 30, -- 10, -- -1 -- SELECT * -- FROM dbo.VirtualFileStats -- WHERE DBID = 10 -- ORDER BY RecordID -- platforms: -- SQL Server 2005 -- SQL Server 2008 -- SQL Server 2008 R2 -- tested: -- SQL Server 2005 SP2 -- SQL Server 2008 R2 --------------------------------------------------------------------------------------- -- *** change log *** -- Added history table and perge on start up if there is data in the main table -- *** end change log *** ------------------------------------------------------------------------------------- CREATE PROC dbo.Gathervirtualfilestats @Duration DATETIME = '01:00:00', @IntervalInSeconds INT = 120, @DB INT = -1, @DBFile INT = -1 AS SET nocount ON DECLARE @StopTime DATETIME, @LastRecordedDateTime DATETIME, @CurrentDateTime DATETIME, @ErrorNumber INT, @NumberOfRows INT, @ErrorMessageText NVARCHAR(4000), @CurrentServerName VARCHAR(255), @DifferenceInMilliSeconds BIGINT IF EXISTS (SELECT 1 FROM dbo.VirtualFileStats) BEGIN IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = Object_id(N'[dbo].[VirtualFileStats]') AND Objectproperty(id, N'IsTable') = 1) BEGIN INSERT INTO dbo.VirtualFileStatsHistory SELECT * FROM VirtualFileStats; TRUNCATE TABLE dbo.VirtualFileStats; END END SELECT @CurrentServerName = Cast(Serverproperty('servername') AS VARCHAR(255)) SET @DifferenceInMilliSeconds = Datediff(ms, CONVERT(DATETIME, '00:00:00', 8), @Duration) SELECT @StopTime = Dateadd(ms, @DifferenceInMilliSeconds, Getdate()) WHILE Getdate() <= @StopTime BEGIN SELECT @LastRecordedDateTime = @CurrentDateTime SELECT @CurrentDateTime = Getdate() INSERT INTO dbo.VirtualFileStats (ServerName, DBID, FileID, Reads, ReadsFromStart, Writes, WritesFromStart, BytesRead, BytesReadFromStart, BytesWritten, BytesWrittenFromStart, IostallInMilliseconds, IostallInMillisecondsFromStart, IostallReadsInMilliseconds, IostallReadsInMillisecondsFromStart, IostallWritesInMilliseconds, IostallWritesInMillisecondsFromStart, RecordedDateTime, IntervalinMilliseconds, FirstMeasureFromStart) SELECT @CurrentServerName, vfs.database_id, vfs.[file_id], vfs.num_of_reads - dbaf.ReadsFromStart AS Reads, vfs.num_of_reads AS ReadsFromStart, vfs.num_of_writes - dbaf.WritesFromStart AS Writes, vfs.num_of_writes AS WritesFromStart, vfs.num_of_bytes_read - dbaf.BytesReadFromStart AS BytesRead, vfs.num_of_bytes_read AS BytesReadFromStart, vfs.num_of_bytes_written - dbaf.BytesWrittenFromStart AS BytesWritten, vfs.num_of_bytes_written AS BytesWrittenFromStart, vfs.io_stall - dbaf.IostallInMillisecondsFromStart AS IostallInMilliseconds, vfs.io_stall AS IostallInMillisecondsFromStart, vfs.io_stall_read_ms - dbaf.IostallReadsInMillisecondsFromStart AS IostallReadsInMilliseconds, vfs.io_stall_read_ms AS IostallReadsInMillisecondsFromStart, vfs.io_stall_write_ms - dbaf.IostallWritesInMillisecondsFromStart AS IostallWritesInMilliseconds, vfs.io_stall_write_ms AS IostallWritesInMillisecondsFromStart, @CurrentDateTime, CASE WHEN @LastRecordedDateTime IS NULL THEN NULL ELSE Datediff(ms, dbaf.RecordedDateTime, @CurrentDateTime) END AS IntervalInMilliseconds, CASE WHEN @LastRecordedDateTime IS NULL THEN 1 ELSE 0 END AS FirstMeasureFromStart FROM sys.Dm_io_virtual_file_stats(@DB, @DBFile) vfs LEFT OUTER JOIN VirtualFileStats dbaf ON vfs.database_id = dbaf.dbid AND vfs.[file_id] = dbaf.fileid WHERE ( @LastRecordedDateTime IS NULL OR dbaf.RecordedDateTime = @LastRecordedDateTime ) SELECT @ErrorNumber = @@ERROR, @NumberOfRows = @@ROWCOUNT IF @ErrorNumber != 0 BEGIN SET @ErrorMessageText = 'Error ' + CONVERT(VARCHAR(10), @ErrorNumber) + ' failed to insert file stats data!' RAISERROR (@ErrorMessageText, 16, 1) WITH LOG RETURN @ErrorNumber END WAITFOR DELAY @IntervalInSeconds END
I Have Data, Now What?
This is where the fun begins! If you just query the table the data doesn’t make much sense.
We need to do some simple math and get the data into a metric that is meaningful.
SELECT TOP 10 Db_name(dbid) AS 'databasename', File_name(fileid) AS 'filename', Reads / ( IntervalInMilliSeconds / 1000 ) AS 'readspersecond', Writes / ( IntervalInMilliSeconds / 1000 ) AS 'writespersecond', ( Reads + Writes ) / ( IntervalInMilliSeconds / 1000 ) AS 'iopersecond', CASE WHEN ( Reads / ( IntervalInMilliSeconds / 1000 ) ) > 0 AND IostallReadsInMilliseconds > 0 THEN IostallReadsInMilliseconds / Reads ELSE 0 END AS 'iolatencyreads', CASE WHEN ( Reads / ( IntervalInMilliSeconds / 1000 ) ) > 0 AND IostallWritesInMilliseconds > 0 THEN IostallWritesInMilliseconds / Writes ELSE 0 END AS 'iolatencywrites', CASE WHEN ( ( Reads + Writes ) / ( IntervalInMilliSeconds / 1000 ) > 0 AND IostallInMilliseconds > 0 ) THEN IostallInMilliseconds / ( Reads + Writes ) ELSE 0 END AS 'iolatency', RecordedDateTime FROM management.dbo.VirtualFileStats WHERE DBID = 10 AND FirstMeasureFromStart = 0 ORDER BY RecordID
This gives us reads, writes and io latency per second results.
Now we are cooking! We can now see that on this database we are seeing some spikes in latency, the number of milliseconds it takes to complete a single IO request, and may warrant investigation. As a general rule of thumb if I see IO latency above 20 milliseconds consistently I start looking deeper into the IO system to see what is wrong. A single modern hard disk is capable of 130 random IO’s a second. Another thing to consider is how many databases are on the same disk. I will pull the database files together with the volume they are on to get a true aggregate number of IO’s and latencies. You may find that a single database is dominating the disks and causing other databases to slow down even when the number of IO’s for those databases is small. On a SAN being able to get to the file level can help you locate the physical disks for that LUN and help your SAN administrators look at a very specific spot instead of the overall health of the SAN and actually fix your performance issues.
Lastly, I run a nightly job that performs these aggregations for me and moves them into a table that I keep long term so I can see performance over time. This is a great way to see if you are getting near your IO capacity and if the steps you have done correct the issues.
So, That’s it?
Not by a long shot. Measuring IO latencies is just one piece of a much larger puzzle for troubleshooting performance problems. It is a valuable one though, and keeping this information over time is just priceless.
I added a history table an a check at the beginning to move all data to history before starting up. Mike Kaplan reported below that multiple runs caused issues if there was data in the table from a previous run.
I attended and spoke at SQLSaturday #57, and it was an awesome event! Here are my notes and observations on the trip as a whole.
As always, I try to be in town on Friday night to do the speaker dinner. It’s always worth it. Even if you hate the food, resturant or the part of town the PEOPLE make it so worth wild. I always meet someone new and get to cultivate relationships that normally would only get some TLC at The Summit. To me SQLSaturday is a cheap way to keep my speaking skills sharp, educate some folks and get to spend quality time with a great group of people.
Friday night I got to do one of the things on my “bucket list”, guest on DBAs@Midnight with my friends Sean, Jen and Patrick. I’ve worked in broadcasting but it’s never easy to do. Sean and Jen put quite a bit of work into these weekly shows and being on the other side of the camera reminded me of that!
I was surprised at how nice the facilities were. It wasn’t what I had in mind when I saw it was at a church. It is always cool to see people looking to other communities besides the traditional venues for this kind of event. The food was awesome. Being a meat eater having hot brisket was a big plus for me! There were two things that will be corrected the next time around. Signage and room numbering. It wasn’t a huge deal just a rough spot in an otherwise flawless event.
You can download my slide deck here.
Again, thanks for putting on such a great event and allowing me to come speak!