SQLDIY: Gather Virtual File Statistics Using T-SQL #TSQL2sDay 15
It’s that time again, T-SQL Tuesday is here! This time Pat Wright (blog|twitter) is hosting and has put forth automating tasks using ether T-SQL or Powershell. I LOVE automating stuff. As a production DBA in some very large shops you can’t do your job unless you make your servers work for you. I’ve been using T-SQL and *GASP* xp_cmdshell, OSQL and file shares to gather stats and push configurations to servers for decades. Log before fancy things like C# and Powershell existed. These days I use a variety of home grown tools but doing things with just T-SQL can be just as powerful. I’m going to leverage this post to start a series on a pure T-SQL implementation of configuration management, data gathering and utility procedure deployment.
Where Is Your Management Database?
Every DBA should have two things, a utility, or management database on every server and a central repository where all the locally collected data in the management database is pulled back to. What surprises people when I talk about this methodology is I don’t always advocate using a licensed copy of SQL Server. By that I mean that SQL Express works just great as a central repository server. I usually put this on a virtual machine, Hyper-V or what ever flavor of virtual environment your company supports. This allows you to do things like enable CLR and xp_cmdshell on a server that technically is non-production and keep your security risk much lower. Every server that is deployed in my shop gets a local management database. From the central repository I can push out everything I need to manage a server and keep all my servers up to date as I make improvements or bug fixes to the management code. That’s all I’m really going to say about that in this post though I just wanted to give you an idea of just how deep the rabbit hole can go.
DMV’s give you the information, you have to use it.
Since SQL Sever 2005 Microsoft let of of the black box mentality and started providing crazy useful information via Dynamic Management Views. Virtual file statistics though have been around for quite a while. They got a touch up in the DMV but the basic information was available in SQL Server 2000 via function call. The DMV I’m after is sys.dm_io_virtual_file_stats. It has a ton of information in it. It’s main problem though is it is an aggregation over time and doesn’t really tell you what is different from yesterday to today. To get around that we have to build our own sampling routine.
The VirtualFileStats Table
We create a table in our local management database to collect the daily numbers. I try to keep things simple.
CREATE TABLE dbo.VirtualFileStats ( RecordID INT IDENTITY(1,1) NOT NULL, ServerName VARCHAR(255) NOT NULL, DBID INT NOT NULL, FileID INT NOT NULL, Reads BIGINT NULL, ReadsFromStart BIGINT NULL, Writes BIGINT NULL, WritesFromStart BIGINT NULL, BytesRead BIGINT NULL, BytesReadFromStart BIGINT NULL, BytesWritten BIGINT NULL, BytesWrittenFromStart BIGINT NULL, IostallInMilliseconds BIGINT NULL, IostallInMillisecondsFromStart BIGINT NULL, IostallReadsInMilliseconds BIGINT NULL, IostallReadsInMillisecondsFromStart BIGINT NULL, IostallWritesInMilliseconds BIGINT NULL, IostallWritesInMillisecondsFromStart BIGINT NULL, RecordedDateTime DATETIME NULL, IntervalInMilliseconds BIGINT NULL, FirstMeasureFromStart BIT NULL ) GO CREATE TABLE dbo.VirtualFileStatsHistory ( RecordID INT NOT NULL, ServerName VARCHAR(255) NOT NULL, DBID INT NOT NULL, FileID INT NOT NULL, Reads BIGINT NULL, ReadsFromStart BIGINT NULL, Writes BIGINT NULL, WritesFromStart BIGINT NULL, BytesRead BIGINT NULL, BytesReadFromStart BIGINT NULL, BytesWritten BIGINT NULL, BytesWrittenFromStart BIGINT NULL, IostallInMilliseconds BIGINT NULL, IostallInMillisecondsFromStart BIGINT NULL, IostallReadsInMilliseconds BIGINT NULL, IostallReadsInMillisecondsFromStart BIGINT NULL, IostallWritesInMilliseconds BIGINT NULL, IostallWritesInMillisecondsFromStart BIGINT NULL, RecordedDateTime DATETIME NULL, IntervalInMilliseconds BIGINT NULL, FirstMeasureFromStart BIT NULL )
This is what we need to gather, and later analyze the data. Since we are managing our samples we have to know when the sampling started and what the first sample is. FirstMeasureFromStart lets us know that it is the first base measurements the rest of the samples will delta off of.
GatherVirtualFileStats Stored Procedure
Next we need a stored procedure to do the sampling. One thing you will notice is the procedure executes continuously with a WAIT FOR DELAY built into it so you can get finer grained than the 1 minute limitation of the SQL Agent. Sometimes, I will do one off sampling for a short period, say 30 minutes at a 30 second interval but most often I just let it run and set the sample rate at 1 minute or larger depending on how busy the system is.
IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = Object_id(N'[dbo].[GatherVirtualFileStats]') AND Objectproperty(id, N'IsProcedure') = 1) DROP PROCEDURE [dbo].[GatherVirtualFileStats] GO -------------------------------------------------------------------------------------- -- GatherVirtualFileStats -- by: Wesley D. Brown -- date: 02/08/2011 -- mod: 04/14/2011 -- description: -- This stored procedure is used to sample sys.dm_io_virtual_file_stats to track -- performance at a database file level. This is useful for finding -- hotspots on SAN's or under performing IO systems. -- parameters: -- @Duration = '01:00:00' How long to run before exiting -- @IntervalInSeconds = 120 Number of seconds between samples --@DB = -1 DB_ID to monitor, -1 for all --@DBFile = -1 File_ID of file to monitor, -1 for all -- usage: -- DECLARE @RC INT, -- @StartTime DATETIME, -- @databaseID INT -- SELECT @StartTime = Getdate(), -- @databaseID = Db_id() -- EXEC @RC = Gathervirtualfilestats -- '00:45:30', -- 30, -- 10, -- -1 -- SELECT * -- FROM dbo.VirtualFileStats -- WHERE DBID = 10 -- ORDER BY RecordID -- platforms: -- SQL Server 2005 -- SQL Server 2008 -- SQL Server 2008 R2 -- tested: -- SQL Server 2005 SP2 -- SQL Server 2008 R2 --------------------------------------------------------------------------------------- -- *** change log *** -- Added history table and perge on start up if there is data in the main table -- *** end change log *** ------------------------------------------------------------------------------------- CREATE PROC dbo.Gathervirtualfilestats @Duration DATETIME = '01:00:00', @IntervalInSeconds INT = 120, @DB INT = -1, @DBFile INT = -1 AS SET nocount ON DECLARE @StopTime DATETIME, @LastRecordedDateTime DATETIME, @CurrentDateTime DATETIME, @ErrorNumber INT, @NumberOfRows INT, @ErrorMessageText NVARCHAR(4000), @CurrentServerName VARCHAR(255), @DifferenceInMilliSeconds BIGINT IF EXISTS (SELECT 1 FROM dbo.VirtualFileStats) BEGIN IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = Object_id(N'[dbo].[VirtualFileStats]') AND Objectproperty(id, N'IsTable') = 1) BEGIN INSERT INTO dbo.VirtualFileStatsHistory SELECT * FROM VirtualFileStats; TRUNCATE TABLE dbo.VirtualFileStats; END END SELECT @CurrentServerName = Cast(Serverproperty('servername') AS VARCHAR(255)) SET @DifferenceInMilliSeconds = Datediff(ms, CONVERT(DATETIME, '00:00:00', 8), @Duration) SELECT @StopTime = Dateadd(ms, @DifferenceInMilliSeconds, Getdate()) WHILE Getdate() <= @StopTime BEGIN SELECT @LastRecordedDateTime = @CurrentDateTime SELECT @CurrentDateTime = Getdate() INSERT INTO dbo.VirtualFileStats (ServerName, DBID, FileID, Reads, ReadsFromStart, Writes, WritesFromStart, BytesRead, BytesReadFromStart, BytesWritten, BytesWrittenFromStart, IostallInMilliseconds, IostallInMillisecondsFromStart, IostallReadsInMilliseconds, IostallReadsInMillisecondsFromStart, IostallWritesInMilliseconds, IostallWritesInMillisecondsFromStart, RecordedDateTime, IntervalinMilliseconds, FirstMeasureFromStart) SELECT @CurrentServerName, vfs.database_id, vfs.[file_id], vfs.num_of_reads - dbaf.ReadsFromStart AS Reads, vfs.num_of_reads AS ReadsFromStart, vfs.num_of_writes - dbaf.WritesFromStart AS Writes, vfs.num_of_writes AS WritesFromStart, vfs.num_of_bytes_read - dbaf.BytesReadFromStart AS BytesRead, vfs.num_of_bytes_read AS BytesReadFromStart, vfs.num_of_bytes_written - dbaf.BytesWrittenFromStart AS BytesWritten, vfs.num_of_bytes_written AS BytesWrittenFromStart, vfs.io_stall - dbaf.IostallInMillisecondsFromStart AS IostallInMilliseconds, vfs.io_stall AS IostallInMillisecondsFromStart, vfs.io_stall_read_ms - dbaf.IostallReadsInMillisecondsFromStart AS IostallReadsInMilliseconds, vfs.io_stall_read_ms AS IostallReadsInMillisecondsFromStart, vfs.io_stall_write_ms - dbaf.IostallWritesInMillisecondsFromStart AS IostallWritesInMilliseconds, vfs.io_stall_write_ms AS IostallWritesInMillisecondsFromStart, @CurrentDateTime, CASE WHEN @LastRecordedDateTime IS NULL THEN NULL ELSE Datediff(ms, dbaf.RecordedDateTime, @CurrentDateTime) END AS IntervalInMilliseconds, CASE WHEN @LastRecordedDateTime IS NULL THEN 1 ELSE 0 END AS FirstMeasureFromStart FROM sys.Dm_io_virtual_file_stats(@DB, @DBFile) vfs LEFT OUTER JOIN VirtualFileStats dbaf ON vfs.database_id = dbaf.dbid AND vfs.[file_id] = dbaf.fileid WHERE ( @LastRecordedDateTime IS NULL OR dbaf.RecordedDateTime = @LastRecordedDateTime ) SELECT @ErrorNumber = @@ERROR, @NumberOfRows = @@ROWCOUNT IF @ErrorNumber != 0 BEGIN SET @ErrorMessageText = 'Error ' + CONVERT(VARCHAR(10), @ErrorNumber) + ' failed to insert file stats data!' RAISERROR (@ErrorMessageText, 16, 1) WITH LOG RETURN @ErrorNumber END WAITFOR DELAY @IntervalInSeconds END
I Have Data, Now What?
This is where the fun begins! If you just query the table the data doesn’t make much sense.
We need to do some simple math and get the data into a metric that is meaningful.
SELECT TOP 10 Db_name(dbid) AS 'databasename', File_name(fileid) AS 'filename', Reads / ( IntervalInMilliSeconds / 1000 ) AS 'readspersecond', Writes / ( IntervalInMilliSeconds / 1000 ) AS 'writespersecond', ( Reads + Writes ) / ( IntervalInMilliSeconds / 1000 ) AS 'iopersecond', CASE WHEN ( Reads / ( IntervalInMilliSeconds / 1000 ) ) > 0 AND IostallReadsInMilliseconds > 0 THEN IostallReadsInMilliseconds / Reads ELSE 0 END AS 'iolatencyreads', CASE WHEN ( Reads / ( IntervalInMilliSeconds / 1000 ) ) > 0 AND IostallWritesInMilliseconds > 0 THEN IostallWritesInMilliseconds / Writes ELSE 0 END AS 'iolatencywrites', CASE WHEN ( ( Reads + Writes ) / ( IntervalInMilliSeconds / 1000 ) > 0 AND IostallInMilliseconds > 0 ) THEN IostallInMilliseconds / ( Reads + Writes ) ELSE 0 END AS 'iolatency', RecordedDateTime FROM management.dbo.VirtualFileStats WHERE DBID = 10 AND FirstMeasureFromStart = 0 ORDER BY RecordID
This gives us reads, writes and io latency per second results.
Now we are cooking! We can now see that on this database we are seeing some spikes in latency, the number of milliseconds it takes to complete a single IO request, and may warrant investigation. As a general rule of thumb if I see IO latency above 20 milliseconds consistently I start looking deeper into the IO system to see what is wrong. A single modern hard disk is capable of 130 random IO’s a second. Another thing to consider is how many databases are on the same disk. I will pull the database files together with the volume they are on to get a true aggregate number of IO’s and latencies. You may find that a single database is dominating the disks and causing other databases to slow down even when the number of IO’s for those databases is small. On a SAN being able to get to the file level can help you locate the physical disks for that LUN and help your SAN administrators look at a very specific spot instead of the overall health of the SAN and actually fix your performance issues.
Lastly, I run a nightly job that performs these aggregations for me and moves them into a table that I keep long term so I can see performance over time. This is a great way to see if you are getting near your IO capacity and if the steps you have done correct the issues.
So, That’s it?
Not by a long shot. Measuring IO latencies is just one piece of a much larger puzzle for troubleshooting performance problems. It is a valuable one though, and keeping this information over time is just priceless.
I added a history table an a check at the beginning to move all data to history before starting up. Mike Kaplan reported below that multiple runs caused issues if there was data in the table from a previous run.