Category Archives: SQLServerPedia Syndication

Finding SQL Server Installs Using Powershell

Old Dog, New Tricks

I’ve been writing tools for SQL Server for a lot of years. Some of these tools were never completely released. Some of them were just for me. Some of them overlapped other tools already on the market and free for all. Recently, I started updating my bag of tricks and tools. This seemed like a great time to get back into PowerShell. I decided to pull out a bit of C# code I cobbled together nine years ago as part of a tool to find SQL Server instances on a network. I never really got around to making it a “production” ready tool since there was already a most excellent one on the scene in the form of SQLPing from Chip Andrews. SQLPing is a fantastic scan tool and can scan many more things than the method covered here.

Hello Operator?

When Microsoft implemented named instances with SQL Server 2000 they had to have a way to direct incoming traffic pointed to a single IP to the correct TCP port. So, they built in a traffic cop. When SQL Server 2005 came around it was pulled from the core network listener and put into its own service, the browser service. There was little documentation on how the browser worked. Luckily for poor sods like me using a network packet sniffing tool it was pretty easy to figure out how to talk to the browser. Once I figured out how to get the browser service to tell me what instances it knew about it was trivial to implement. These days Microsoft is being much more open about these kinds of things and actually have released documentation on how the browser service and the SQL Server Resolution Protocol works.

The Basic Mechanics.

As most of you know SQL Server’s default instance is on 1433 and the browser service is on 1434. Our goal is to send a UDP packet to port 1434. According to the docs we only need to send a single byte containing the number two. This prompts the listener to give us a list of instances and what port they are bound to. When I wrote my implementation it really was that simple. I dug around and figured out how to get PowerShell to send a UDP packet. I tested it and lo’ it worked, on my machine….

It’s Never So Easy.

When I tested it on my lab VM cluster with multiple nodes and multiple instances it would fail! I just didn’t get it. My C# code from the stone age worked just fine. My PowerShell code was a hit or miss. I started troubleshooting the problem just as I had in the beginning. I fired up my network sniffer and watched the traffic flow back and fourth. Again, I saw exactly what I expected, or more accurately what I wanted to see.  A single packet with the hex number two in the first byte position. I ran the test several times over the next hour or so. Eventually, I just had to walk away from it. The next day I started over again. I read the documentation, it still said the same thing. I ran the test, still a two in the first byte position. Then I spotted it. The packet wasn’t one byte long. I went back and read the document again. It gives an upper boundary of 32 bytes but no lower limit. The packet I saw come through wasn’t one byte long it was always more than that. Armed with that I started big and worked my way down until I got errors. Now I know that a packet of three bytes always triggers a response. a  two with two zeros.

Meet QuerySQLListener.

Here is the function I put together. It takes a server name and returns an array of strings with the fun bits in it.

function QuerySQLListener{
    [cmdletbinding(
        DefaultParameterSetName = '',
        ConfirmImpact = "low"
    )]
    Param(
        [Parameter(
            Mandatory = $True,
            Position = 0,
            ParameterSetName = '',
            ValueFromPipeline = $True)]
        [string]$Computer
    )
    Begin {
        $ErrorActionPreference = "SilentlyContinue"
        $Port = 1434
        $ConnectionTimeout = 1000
        $Responses  = @();
    }
    Process {
        $UDPClient = new-Object system.Net.Sockets.Udpclient
        $UDPClient.client.ReceiveTimeout = $ConnectionTimeout
        $IPAddress = [System.Net.Dns]::GetHostEntry($Computer).AddressList[0].IPAddressToString
        $UDPClient.Connect($IPAddress,$Port)
        $ToASCII = new-object system.text.asciiencoding
        $UDPPacket = 0x02,0x00,0x00
        Try {
            $UDPEndpoint = New-Object system.net.ipendpoint([system.net.ipaddress]::Any,0)
            $UDPClient.Client.Blocking = $True
            [void]$UDPClient.Send($UDPPacket,UDPPacket.length)
            $BytesRecived = $UDPClient.Receive([ref]$UDPEndpoint)
            [string]$Response = $ToASCII.GetString($BytesRecived)
            $res = ""
            If ($Response) {
                $Response = $Response.Substring(3,$Response.Length-3).Replace(";;","~")
                #$i = 0;
                $Response.Split("~") | ForEach {
                $Responses += $_
            }
            $socket = $null;
            $UDPClient.close()
        }
        }
        Catch {
            $Error[0].ToString()
            $UDPClient.Close()
        }
    }
    End {
        return ,$Responses
    }
}

 

It Isn’t Perfect But It Works.

I”m sure there is a cleaner way to implement it but I’m really just getting into PowerShell again after several months of tinkering with it last time. If you have any suggestions or improvements I’ll gladly take them!

New Toy: The Brydge iPad Keyboard – Updated

Finally a REAL keyboard for my iPad!

I can’t tell you how long I’ve wanted a real keyboard that my iPad could dock with easily. I have always hated lugging around a laptop everywhere to do any real typing. When I got the first iPad I really thought it was going to allow me to cut the need for a laptop way down but without a solid portable keyboard and the smaller 1024×768 screen it basically became a gaming and reading machine. I found myself using my then Galaxy with the sliding keyboard to do almost all my real typing for emails.

Redoubling my effort.

When the New-now old, but not 3rd generation but the third release, iPad came out with an incredible screen resolution I vowed to try to cut my usage down again. I could use VPN and RDP into any box and see the whole screen but typing out T-SQL commands was still very painful on the touchscreen. So, I started testing every bluetooth keyboard or keyboard case out there. I found a few like the Logitech and the apple keyboard more than adequate for the job but they were bulky and I still needed a stand and a case of some sort. Oh, and they aren’t cheap ether. I eventually found a keyboard/case that the keyboard was wafer thin and detached from the crappy case so I carried that when I HAD to have a keyboard and left my laptop behind. At the end of April I found out about Brydge on Kickstarter. It looked like exactly what I’ve always wanted in a keyboard dock for my iPad. I decided to take a risk and support the project. It was funded with an expected ship date in October, just in time for the PASS Summit! WOO HOO! Well, the project was really successful and they had to delay a bit as they tooled up for a much larger production run. That meant I didn’t get it until the 28th of November.

So, Was It Worth It?

Well, I think so. Let me break down what I like and what I’m not so fond of.20121130-101243.jpg

Likes

The Keyboard is generally OK  I love the fact it has solid tactile feedback. I LOVE buckling key keyboards so having a keyboard with some feedback and that takes a bit of force to actually type a letter suits me just fine. The keyboard is also recessed into the frame so you don’t have smudges on your beautiful retina display. I really hated the fact that my 50 dollar official iPad magnetic cover always left lines and helped spread my oily fingerprints evenly over the screen. It also has a pretty full complement of keys on it and generally are pretty easy to reach without too many typo’s. I also love the full alt, shift and arrow keys so I don’t have to touch my iPad screen to select text to cut and paste.

I’m also good with the hinging system. They altered it a bit from the initial prototype to make it easier to future-proof it. Out of the gate it works very well with the iPad 2,3 and 4 body styles. The hinges are also stiff enough to position the screen at any angle and you don’t have to worry about it tilting on its own. It will also lay completely almost completely flat.

It has some weight to it. At first I was mildly put off with the extra heft but quickly realized it helped stabilize the whole setup nicely. Unlike a laptop where the screen is the lightest part of the device here the iPad has some heft to it and on a couple of other keyboard/case combos it would flip over onto it’s back if the angle was bad.

It is about the size of a Macbook air which is also nice. It doesn’t taper to that razors edge but I can live with that.

Dislikes

The keyboard is a little cramped and if you have big hands it may be difficult to type accurately and fast at the same time. The space bar is also a bit touchy. I use just one thumb on my right hand to trigger it and sometimes it doesn’t respond. I think the problem is they have two springs holding up the space bar but only one switch under it. It also seems to be worse under the left thumb than the right.

There are also two rubber pads for the iPad to rest on when it is fully closed that kind of rub on my wrists if I’m laying my hands flat.

It doesn’t seem to have the magnetic trigger. When I open it up I have to push the home button on the iPad or on the keypad to wake up my iPad. Not a huge thing but I really like that feature on the newer iPads. Well, it does when you set it up right. I had aligned the home button on the iPad with the home button on the keyboard. As a lefty it looked good to me.

2012-11-30T10-18-01_0

If you look the back facing camera now is unrestricted and usable.

Landscape only. I know, it is mimicking a laptop experience but it would have been cool to rotate the iPad between portrait and landscape. It isn’t hard to pull the iPad from the keyboard just a little wishful thinking on my part.

The hinge pads. Since you can use the iPad 2,3 or 4 the hinges use rubber pads to grip your iPad. They have little sticky pads on the back and the Brydge ships with the iPad 2 shims already attached. When I removed them it completely ruined the sticky pads and cleaning up the hinges was a bit of a pain.

2012-11-30T10-18-01_2

The hinge stickers. That’s right, stickers… The hinges themselves are aluminium and silver accordingly if you have a black iPad they are stark against the frame so the black stickers help correct that. I don’t have the best skills when applying stickers so mine are just a smidge off. They do ship two sets and I will probably redo mine again or get someone with steamer hands to do it.

The hinge partially covers the rear facing camera. Again, not a huge deal just an observation if you plan on using the camera you will have to remove it from the Brydge. Not if you rotate it see above.

It is made out of aluminium but doesn’t have the smooth finish like the rest of the Apple products which was a disappointment aesthetically.
I have revived several comments that it matches the iPad well and several people asked if it was a new Macbook Air.

It can be difficult to open the hinges are stiff but they did put a groove to help with that.

2012-11-30T10-18-01_3

 

 

It also as a small wobble. When the iPad isn’t attached it sits flat on a level surface. When I put the iPad in though the right front is just a little raised so it will move just a bit. I shaved the left hand side rubber foot down about the width of two business cards and it fixed the issue.

2012-11-30T10-18-01_1

 

You can see there are four rubber feet on the bottom.

It isn’t cheap. I got in the early bird at 150.00 dollars but it will be selling for more than that at retail. That is about a 50.00 dollar premium over any other keyboard I’ve bought so far.

The Oddities

Since it feels like a netbook or a little laptop I keep going down under the keyboard to touch the scroll pad to move the mouse around. It will take a bit of time but I’m sure I’ll get past that.

As with any new keyboard, it takes time to get use to the layout and feel of it. Since it feels like a netbook I also want to use my keyboard shortcuts to do things like cut and paste. There may be a way to set this up and I’ll dig into it a bit more and report back.

Initial Verdict: WIN!

Even though it looks like my dislikes out number the likes I really do dig it so far. I even wrote this blog post using Blogsy for iPad. I’ve NEVER written this much text on my iPad in one setting. It is pricy, but if I use it as much as I think I will I’m not sweating it. The Brydge was made for people like me and I’m glad I backed the project.

Go check them out at http://www.thebrydge.com if you buy one drop me a line and let me know what you think about it.

Quick Tip: SSIS and SAP BW Round 2

Again, frustration.

After almost breaking my arm patting myself on the back getting past my last SAP BW issue I found that SAP BW Connector and SQL Server 2012 were punishing me again. I was building a second package against the same SAP instance on the same SQL Server 2012 instance when I hit something more than flip this field from 3 to 1 kind of thing.

NO SOUP FOR YOU!

“Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: Index”

Oh mister SSIS this means war! Looking at the complete output log I saw that SAP BW had actually delivered the data but the SAP BW source component had thrown a hard error. It was a general error and searching Bing yielded next to nothing. I did find posts like this one that pointed to some bugs inside and outside the connector. This, of course, builds a ton of confidence that its going to work in the future.

Desperation Leads To A Fix.

After searching dozens of KB articles around the SAP BW 1.1 and 1.0 connector I finally just pulled the trigger and installed service pack 1 for SQL Server 2012 and the 2012 SP1 feature pack SAP BW 1.1 connector. I hate when the standard tech support advice works…. The scary thing is there are still outstanding bugs that are fixed in CU3 and CU4 that aren’t included in SP1. The worst thing is I never could find an exact KB that matched my error codes. I don’t know why it was broken and I don’t know if there was a specific patch or CU that addressed my bug. This isn’t the first time something like this has happened ether. Sometimes bug fixes don’t get listed when a CU or service pack is made available.

Quick Tip: SSIS and SAP BW

Frustration.

If you have ever had to work with Integration Services and data sources other than the Microsoft variety you know how frustrating it can be. I recently started a project using SQL Server 2012 and SAP BW 7. The customer had already decided that the other commercial options weren’t viable and settled on the Microsoft Connector 1.1 for SAP BW. A test package was built and worked.

I had already read through Using SQL Server 2008 Integration Services with SAP BI 7.0 and felt comfortable with the instructions. Even though we were using SSIS 2012 I knew there was very little differences between the 1.0 connector and the 1.1 connector. I built out our new package and all seemed well. The SAP BW connection in the connection manager hooked up and tested out. The SAP BW source also tested out pulling the metadata and even delivered preview data just fine. Next, I attempted to run the package. Just as I expected it ran and I got the data I expected. The next three attempts all resulted in SAP BW connector timeouts. I kept poking and prodding it. The SAP admin didn’t see any issues ether.

So, I did what I always do. I stopped tussling with the package and walked through all the steps again. I went back and read the doc again. I made a few notes and shot an email off to the SAP administrator. I settled on one thing pointed out in the doc:

“We want to keep the number of parallel processes to a reasonable value for the overall DTP process type DTP_LOAD, but this parallelism can lead to a timeout error during the Open Hub DTP extraction through Microsoft Connector for SAP BI”

Sure enough, it was set to something other than 1.

VICTORY!

Now, we get all our records and don’t have the timeout issues. The downside seems to be a bit slower performance. If the choice is a faster data pump that only runs once every five or so attempts or one that runs slower but runs every time I’ll slow up a bit. The fact that everything verified and I even got preview data on every attempt was hugely frustrating and time consuming. Knowing when to stop and reevaluate the problem and having a good relationship with the SAP administrator probably saved me hours if not days of cursing and shaking my fist at both Microsoft and SAP.

Solid State Storage: Enterprise State Of Affairs

Here In A Flash!

Its been a crazy last few years in the flash storage space. Things really started taking off around 2006 when NAND flash and moores law got together. in 2010 it was clear that flash storage was going to be a major part of your storage makeup in the future. It may not be NAND flash specifically though. It will be some kind of memory and not spinning disks.

Breaking The Cost Barrier.

For the last few years, I’ve always told people to price out on the cost of IO not the cost of storage. Buying flash storage was mainly a niche product solving a niche problem like to speed up random IO heavy tasks. With the cost of flash storage at or below standard disk based SAN storage with all the same connectivity features and the same software features I think it’s time to put flash storage on the same playing field as our old stalwart SAN solutions.

Right now at the end of 2012, you can get a large amount of flash storage. There is still this perception that it is too expensive and too risky to build out all flash storage arrays. I am here to prove at least cost isn’t as limiting a factor as you may believe. Traditional SAN storage can run you from 5 dollars a Gigabyte to 30 dollars a Gigabyte for spinning disks. You can easily get into an all flash array in that same range.

Here’s Looking At You Flash.

This is a short list of flash vendors currently on the market. I’ve thrown in a couple non-SAN types and a couple traditional SAN’s that have integrated flash storage in them. Please, don’t email me complaining that X vendor didn’t make this list or that Y vendor has different pricing. All the pricing numbers were gathered from published sources on the internet. These sources include, the vendors own website, published costs from TPC executive summaries and official third party price listings. If you are a vendor and don’t like the prices listed here then publicly publish your price list.

There are always two cost metrics I look at dollars per Gigabyte in raw capacity and dollars per Gigabyte in usable capacity. The first number is pretty straight forward. The second metric can get tricky in a hurry. On a disk based SAN that pretty much comes down to what RAID or protection scheme you use. Flash storage almost always introduces deduplication and compression which can muddy the waters a bit.

Fibre Channel/iSCSI vendor list

Nimbus Data

Appearing on the scene in 2006, they have two products currently on the market. the S-Class storage array and the E-Class storage array.

The S-Class seems to be their lower end entry but does come with an impressive software suite. It does provide 10GbE and Fibre Channel connectivity. Looking around at the cost for the S-Class I found a 2.5TB model for 25,000 dollars. That comes out to 9.7 dollars per Gigabyte in raw space. The S-Class is their super scaleable and totally redundant unit. I found a couple of quotes that put it in at 10.oo dollars a Gigabyte of raw storage. Already we have a contender!

Pure Storage

In 2009 Pure Storage started selling their flash only storage solutions. They include deduplication and compression in all their arrays and include that in the cost per Gigabyte. I personally find this a bit fishy since I always like to test with incompressible data as a worst case for any array. This would also drive up their cost. They claim between 5.00 and 10.00 dollars per usable Gigabyte and I haven’t found any solid source for public pricing on their array yet to dispute or confirm this number. They also have a generic “compare us” page on their website that at best is misleading and at worst plain lies. Since they don’t call out any specific vendor in their comparison page its hard to pin them for falsehoods but you can read between the lines.

Violin Memory

Violin Memory started in earnest around 2005 selling not just flash based but memory based arrays. Very quickly they transitioned to all flash arrays. They have two solutions on the market today. The 3000 series which allows some basic SAN style setups but also has direct attachments via external PCIe channels. It comes in at 10.50 dollars a Gigabyte raw and 12 dollars a Gigabyte usable. The 6000 series is their flagship product and the pricing reflects it. At 18.00 dollars per Gigabyte raw it is getting up there on the price scale. Again, not the cheapest but they are well established and have been used and are resold by HP.

Texas Memory Systems/IBM

If you haven’t heard, TMS was recently purchased by IBM. Based in Houston, TX I’ve always had a soft spot for them. They were also the first non-disk based storage solution I ever used. The first time I put a RamSan in and got 200,000 IO’s out of the little box I was sold. Of course it was only 64 Gigabytes of space and cost a small fortune. Today they have a solid flash based fibre attached and iSCSI attached lignup. I couldn’t find any pricing on the current flagship RamSan 820 but the 620 has been used in TPC benchmarks and is still in circulation. It is a heavy weight at 33.30 dollars a Gigabyte of raw storage.

Skyera

A new entrant into this space they are boasting some serious cost savings. They claim a 3.00 dollar per Gigabyte usable on their currently shipping product. The unit also includes options for deduplication and compression which can drive the cost down even further. It is also a half depth 1U solution with a built-in 10GbE switch. They are working on a fault tolerant unit due out second half of next year that will up the price a bit but add Fibre Channel connectivity. They have a solid pedigree as they are made up of the guys that brought the Sanforce controllers to market. They aren’t a proven company yet, and I haven’t seen a unit or been granted access to one ether. Still, I’d keep eye on them. At those price points and the crazy small footprint it may be worth taking a risk on them.

IBM

I’m putting the DS3524 on a separate entry to give you some contrast. This is a traditional SAN frame that has been populated with all SSD drives. With 112 200 GB drives and a total cost of 702908.00 it comes in at 31.00 a Gigabyte of raw storage. On the higher end but still in the price range I generally look to stay in.

SUN/Oracle

I couldn’t resist putting in a Sun F5100 in the mix. at 3,099,000.00 dollars it is the most expensive array I found listed. It has 38.4 Terabytes of raw capacity giving us a 80.00 dollars per Gigabyte price tag. Yikes!

Dell EqualLogic

When the 3Par deal fell apart Dell quickly gobbled up EqualLogic, a SAN manufacturer that focused on iSCSI solutions. This isn’t a flash array. I wanted to add it as contrast to the rest of the list. I found a 5.4 Terabyte array with a 7.00 dollar per Gigabyte raw storage price tag. Not horrible but still more expensive that some of our all flash solutions.

Fusion-io

What list would be complete without including the current king of the PCIe flash hill Fusion-io. I found a retail price listing for their 640 Gigabyte Duo card at 19,000 dollars giving us a 29.00 per usable Gigabyte. Looking at the next lowest card the 320 Gigabyte Duo at 7495.00 dollars ups the price to 32.20 per useable Gigabyte. They are wicked fast though :)

So Now What?

Armed with a bit of knowledge you can go forth and convince your boss and storage team that a SAN array fully based on flash is totally doable from a cost perspective. It may mean taking a bit of a risk but the rewards can be huge.

 

SSIS and Oracle All Your Non-options

Why do some things have to be so hard?

I have been asking myself that question for the better part of two weeks as I wrestle with SSIS 2008 R2 and getting data out of Oracle and into SQL Server.

It’s like the shell game, only with drivers.

Like, go native man!

Like anyone else working in SSIS and dealing with Oracle I started with the default drivers that ship with SSIS. Technically, they work. There are a couple of glaring caveats. They only work in 32 bit mode. Huge non starter on our 64 bit system. They are slow. I know that’s like saying the sky is blue, especially if you don’t have any context. Well I do have some context. Migrating packages from DTS to SSIS I’ve got historical run times and also did some test runs before actually converting the packages. The native 32 bit drivers were slower or just equal to the equivalent on the SQL Server 2000 box running the Oracle 8 drivers. They don’t return the proper metadata column data types. Everything comes back as a wide string a.k.a. st_wstr or varchar for you table creating types. Decimal(18,2)? Thats a varchar(50) for you. varchar(10)? You guessed it, varchar(50) should do it! This beyond anything else was probably the biggest problem.

No, only use what Oracle provides!

I decided to install the Oracle drivers. Let the pain begin!
First, you have to create an account on the Oracle Developer Network site. Really? I just need some drivers. I guess it could have been worse, like a sharp stick to the eye. You need to download the 32 bit and the 64 bit driver packs. Each one weighs in at 700MB compressed. They do include a ton of tools that I have no clue how to use, thats a bonus. Here is your sharp stick to the eye as you get to use of one of the worst installers in the history of installers. After about a dozen tries I finally found out the magic combination to get only the drivers cutting out about 1.3GB from being piled onto my server. Oh, and you get to do it twice. Next, as you look at where it put the drivers at you realize that each install is named client, client_1 and so on. No clue at all which is the 32 bit or 64 bit install bits. Get that figured out, you can go fix your borked path. The installer will gladly stick its path right at the beginning giving your hours of fun trying to figure out exactly what is broken. That is if your path isn’t already too long and it just skips this bit for you. And finally, you get to manually add a system variable pointing to your tnsnames.ora file usually stored in /app/<nt login>/11.2/client/network/tnsnames.ora

After all your hard work you are rewarded with Ole Db, ADO and ADO.Net drivers HUZZAH!
First thing I found out is the Ole Db drivers work as well as the native drivers as far as metadata is concerned. Performance was better. The ADO and ADO.Net do take it up a notch. You do get some additional metadata goodness from these drivers. I did get decimal and float types back but pretty much every string came back as st_wstr again. It would size correctly them that was nice. It did mean I had to add conversion from wide string to string. On a table with 86 columns and 64 million records this also was a bad combination. You do get to run in both 32 bit and 64 bit.

Use what random people on twitter recommend!

Well, not quite that bad. I posted a 14o character version of this post to the mighty #sqlhelp hashtag and lo’ my friend Merrill Aldrich (twitter|blog) and he simply said “Can you use the Attunity connector? #sqlhelp” Huh? When I did a search for just Attunity it brought me to their website http://www.attunity.com/ I didn’t see exaclty what Merrill was talking about. Doing a search for Attunity Connector brought me to the gold I’d been looking for http://www.attunity.com/products/attunity-connect/ssis-connectors-for-oracle-and-teradata. Apparently, Attunity makes connectors for Oracle and Teradata and releases them for F R E E. There are two versions currently 1.2 for SSIS 2008 R2 and 2.0 for SSIS 2012. The installer is rough but not Oracle rough. One of the nice things is installing the 64 bit drivers also installs the 32 bit drivers. Unless you have a problem with the installer and the 64 bit installer only installs the 32 bit drivers. After a few searches and a few more failed install attempts I found that you need to install the Visual C++ 2008 SP1 redistributable package. I only needed to do this on my server since I had Visual Studio 2010 already installed on my laptop. Once that was done the installer worked just fine. Except it really didn’t “install” everything. You still have to manually add them to your toolbox sidebar for data flows before you will see the source and destination connectors for Oracle. It’s totally worth it.

First off with a bang, all the metadata returned was 100% spot on. No more fussing with conversion steps or guessing what the data type should be. They are faster. Not by a small margin ether. On 50~ packages they were around 25% or more faster than the Oracle provided drivers. You still need to have a TNS names file you can’t use the machine name, port and service name directly. (that I know of)

And there was much joy to be had.

I do think it is sad that neither Microsoft or Oracle has a good solution to this issue. I’m glad Microsoft is supporting Attunity I wished they would ship them by default. As for Oracle, I know why Oracle developers and DBA’s get paid so much. If getting drivers installed was this hard I can’t imagine getting the whole database setup and going, ugh.

The Fundamentals of Storage Systems – Shared Consolidated Storage Systems

Shared Consolidated Storage Systems – A Brief History

Hey, “Shared Consolidated Storage Systems” did you just make that up? Why yes, yes I did.

For as long as we have had computers there has been a need to store and retrieve data. We have covered the basics of hard disks, RAID and solid state storage. We have looked at all of this through the aspect of being directly attached to a single server. It’s time we expand to attaching storage pools to servers via some kind of network. The reason I chose to say shared and consolidated storage instead of just SAN or Storage Area Network was to help define, broaden and give focus to what we really mean when we say SAN, NAS, Fibre Channel or even iSCSI. To understand where we are today we need to take a look back at how we got here.

Once, There Were Mainframes…

Yep, I know you have heard of these behemoths. They still roam the IT Earth today. Most of us live in an x86 world though. We owe much to Mainframes. One of these debts is networked storage. Way back when, I’m talking like the 1980’s now, Mainframes would attach to their storage via a system bus. This storage wasn’t internal the way we think of direct attached storage though. They had massive cables running from the Mainframe to the storage pods. The good folks at IBM and other big iron builders wanted to simplify the standard for connecting storage and other peripherals.

 

Who doesn’t love working with these cables?

You could never lose this terminator!

Out With The 1960’s And In with the 1990’s!

Initially IBM introduced it’s own standard in the late 80’s to replace the well aged bus & tag and other similar topologies with something that was more robust and could communicate over optical fiber. ESCON was born. The the rest of the industry backed Fibre Channel which is a protocol that works over optical fiber or copper based networks, more importantly it would be driven by a standards body and not a single vendor. Eventually, Fibre Channel won out. In 1994 Fibre Channel was ratified and became the defacto standard even IBM got on board. Again, we are still talking about connecting storage to a single Mainframe, longer connections were possible and the cabling got a lot cleaner though. To put this in perspective, SQL Server 4.2 was shipping at that point with 6.0 right around the corner.

High Performance Computing  and Editing Video.

One of the other drivers for Fibre Channel was the emerging field of High Performance Computing (HPC) and the need to connect multiple mainframes or other compute nodes to backend storage. Now we are really starting to see storage attached via a dedicated network that is shared among many computers. High end video editing and rendering farms also drove Fibre Channel adoption. Suddenly, those low end pc-based servers had the ability to connect to large amounts of storage just like the mainframers’.

Commodity Servers, Enterprise Storage.

Things got interesting when Moore’s Law kicked into high gear. Suddenly you could buy a server from HP, Dell or even Gateway. With the flood of cheaper yet powerful servers containing either an Intel, MIPS, PPC or Alpha chip you didn’t need to rely on the mainframe so heavily. Coupled with Fibre Channel and suddenly you had the makings for a modern system. One of the biggest challenges in this emerging commodity server space was storage management. Can you deal with having hundreds of servers and thousands of disks without any real management tools? What if you needed to move some unused storage from server A to Server B? People realized quickly that maintaining all these islands of storage was costly and also dangerous. Even if they had RAID systems if someone didn’t notice the warnings you could lose whole systems and the only people who knew something was up was the end user.

Simplify, Consolidate, Virtualize and Highly Available

Sound familiar? With the new age of networked storage we needed new tools and methodologies. We also gained some nifty new features. Network attached storage became much more than a huge hard drive. To me, if you are calling your storage solution a SAN it must have a few specific features.

Simplify

Your SAN solution must use standard interconnects. That means if it takes a special cable that only your vendor sells it doesn’t qualify. In this day and age, if a vendor is trying to lock you into specific interface cards and cables they are going to go the way of the dodo very quickly. Right now the two main flavors are Fiber Optics and copper twisted pair a.k.a Ethernet. It must also reduce your management overhead this usually means a robust software suite above and beyond your normal RAID card interface.

Consolidate

It must be able to bring all your storage needs together under one management system. I’m not just talking disks. Tape drives and other storage technologies like deduplication appliances are in that category. The other benefit to consolidation is generally much better utilization of these resources. Again, this falls back to how robust the software stack that your SAN or NAS comes with.

Virtualize

It must be able to abstract low level storage objects away from the attached servers allowing things like storage pools. This plays heavily into the ability to manage the storage that is available to a server and maintain consistency and up time. How easily can I add a new volume? Is it possible to expand a volume at the SAN level without having to take the volume off-line? Can other resources share the same volumes enabling fun things like clustering?

Highly Available

If you are moving all your eggs into one HUGE basket it better be one heck of a basket. Things like redundant controllers where one controller head can fail but the SAN stays on line without any interruption to the attached servers. Multiple paths into and out of the SAN so you can build out redundant network paths to the storage. Other aspects like SAN to SAN replication to move your data to a completely different storage network in the same room or across the country may be available for a small phenomenal add on fee.

If your SAN or NAS hardware doesn’t support these pillars then you may be dealing with something as simple as a box of disks in a server with a network card. Realize that most SANs and NAS’es are just that. Specialized computers with lots of ways to connect with them and some really kick-ass software to manage it all.

Until Next Time…

Now that we have a bit of history and a framework we will start digging deep into specific SAN and NAS implementations. Where they are strong and where they fall flat.

Demystifying SQL Server Differential Database Backups

Odd Man Out

SQL Server has three backup types. Two you have heard of and used. One, while useful, isn’t very well understood.

Let’s start with a technical recap of the three backup types for SQL Server.

1. Full Database Backup

When you request a full backup, SQL Server dumps all the data pages from your database, metadata about how your database is stored on disk and finally enough of the transaction log to bring the database back into a consistent state.  There are a few things you need to know about the full backup semantics. When you take a full backup it makes a few changes. Those changes are tracked in two places in the database and one in MSDB. The changes tracked in the database allow us to then use transaction log backups and differential database backups. The data logged to MSDB isn’t critical for restoring your backups. It does make it much easier to do so. Full backups are considered our “base” backup type. Every other backup type can use a full database backup as its base. Even though a full backup does capture some of the transaction log it doesn’t clear the log. If you are in simple mode, the normal checkpoint process will clear the log. If you are in bulk load or full recovery mode, you will need to take a transaction log backup to clear the log.

2. Transaction Log Backup

Transaction log backups are a critical part of any recovery plan. Without them you can’t restore up to the minute. If your database is in anything other than simple recovery mode, your only supported option to clear the log is a transaction log backup. Transaction log backups are serial by nature. The log restored depends on either a full or differential for its base and any log backups done before the current log you wish to apply.

3. Differential Database Backup

Like a full database backup, the differential backs up data pages and enough of the transaction log to bring the database back into a consistent state. Unlike full or transaction log backups, the differential backup captures all changes since the last full backup occurred. The information on changed data pages is stored internally in the database and doesn’t require any information from MSDB. The map of changed data pages only gets reset on the next full backup. Transaction log backups or other differential database backups will not reset the changed data map. You can think of transaction log backups as incremental backups. People coming from a systems administration background can get tripped up and treat differential backups like incremental backups. This can cause a significant waste of time when restoring your database since you only need to apply the full backup and the most current differential, or the differential you are interested in to get your database back into a recovered state.

Understanding Differential Database Backups

Most people are put off by the nature of differential backups mainly due to the amount of space they can grow to and the extra complexity they can add to your recovery plans. If you don’t manage them, you can quickly run into a differential that is larger than the full it is based on. Also, any data page alterations are tracked. For example, if you take a full backup then perform full index reorganization on a heavily fragmented index you can end up with very large differential backups. File shrinks with full reorganizations also have the same effect. Even though the actual data hasn’t changed, you end up with differential database backups that are unwieldy and difficult to manage. If you miss a full backup in your schedule, your differentials again may grow larger than your full backup.

There are several cases where differential database backups are a pivotal key to recovering your database quickly and with as little data loss as possible. Let’s take a look at a few scenarios.

Shortening Recovery Time

This alone should be good enough reason for you to investigate differential backups. Every restore operation has a cost-in-time associated to it. Remember, even if a transaction log backup is virtually empty, there is a cost-in-time to spool up and tear down the restore session for each log backup you apply. Not to mention replaying the transactions in the logs. In many cases, it can be much faster to apply a differential backup than applying multiple transaction log backups. By skipping all the data manipulation and just replacing the altered pages you are reducing the amount of IO required to restore.

Database in Simple Recovery Mode

There may be situations where you aren’t concerned with up-to-the-minute recovery but still need something better than weekly full backups to meet your recovery goals. Differentials fit in well here. By leveraging differential backups, you can take a single full once a week and daily differentials to cut down on the space needed to store your backups. Also, since differential backups contain all the changes since the last full, to recover you only need the full backup and the differential backup of the time interval you want to restore to. I recommend keeping your differentials just like you would your transaction log backups so if you need to recover your database into another environment or if you suffer corruption in one of your differentials, you still have as much data as possible to restore.

Large Database with Minimal Data Change

With today’s large disk capacities, it isn’t unusual to see multi-terabyte databases with years of data in them. Moving our full backup schedule out to every two weeks or every month and supplementing with differentials is an excellent way to conserve backup space and shorten time to recovery. Again, we only need the last full, the last differential and any transaction logs after the differential was taken to get us back up to the minute.

Increasing Recoverability

if you only take a full database backup once a week and transaction log backups every 15 minutes, you could end up applying over 670 logs to get your database back on line if you have a failure at the end of the week. If you have any errors in one of the transaction log backups, everything after that is pretty much useless to you. If it dies at backup 599, it may not be the end of your business. If it is log 38, it could mean a week’s worth of data gone. Since differential backups don’t break the LSN chain and transaction log backups don’t reset the changed data map, you can use either backup type even when one or the other may have had an error. Differentials allow us to bridge gaps in our transaction log since we can apply any transaction logs taken after the differential backup. This is one of the real strengths of differential backups. So, if you are doing weekly full backups, daily differential backups and transaction log backups every 15 minutes, you are covered in two ways. Normally, you would restore the full then the latest differential backup followed by any additional transaction logs. If you had a differential backup corrupted but your transaction logs, were fine you could still restore fully.

Repairing Log Shipping

Another great use of differentials is to repair your log shipped databases. If anything happens to the LSN chain, in most cases the only way to repair your log ship target is to start over again from a full and then apply all the logs to get it back up to current. If this is a large database or if there are a lot of transaction log backups to recover this could leave you exposed for quite a while. You can always take a differential backup, apply that to the log ship target then restart your log shipping from that point. I have used this technique successfully over the years when there have been network outages causing our log ship targets to fall way behind cutting catch up time from hours to minutes.

Final Thoughts

Incorporating differential backups will add complexity to your backup strategy but the benefits can be staggering. Between the storage savings and reduction in recovery time it’s clear that differential backups should be in your tool belt. I would also encourage you to practice restoring using your differential backups. Try out different failure scenarios like failed transaction logs or differential backups. Make sure you understand how to restore up to the minute and stop at a specific time now that you have differentials in the mix.

This is a re-blog from an article I wrote for SWWUG on April 19th 2012

Free Tools for Testing SQL Sever

You Need To Test Your SQL Server!

Its like screaming “TEST THE BREAKS ON YOUR CAR!” If you don’t have the tools or know how it is just good advice you can’t follow. I’ve done a TON of benchmarks over the years using every tool imaginable including ones I’ve written myself. There are several barriers to generating a valid repeatable benchmark in your own environment. Lets take a look at the general criteria that I use when testing OLTP systems and the benchmark tools available.

I’ve Been Spoiled

I’m going to come clean. I really like Quest Software’s Benchmark Factory. They have put years of work into the tool set. It is also one of the few database benchmark tools that runs against most major RDBMS platforms. This alone made it my go to tool for a very long time. Also, since it implements most of the TPC Council benchmark specifications when someone asks how I tested I can hand them reams of information on the benchmark in question. TPC also has a well defined metric for determining not just raw performance but a dollar amount to get that performance. The catch is that Benchmark Factory isn’t an apples to apples with the published TPC results. To assume so would be a horrible mistake on your part. If you have ever read through a full disclosure (I read them to my four year old to put him to bed) you know that it would be very difficult for your organization to implement the exact same test on the exact same software without spending some major bucks. I have worked a few projects that were willing to spend the money  and bring in a team to implement the testing protocols. If you are talking a project that is going to run say ten million your boss may be willing to spend 100,000 of that to completely test the system before it goes into production. In most cases though, this isn’t the case. Even spending the money on Benchmark Factory may be outside of your budget. So, we turn to free or opensource solutions sometimes we just write our own.

Other Options

There are several other options for testing database servers. I’ve use a few over the years. First, I’m limiting this list to those that run on Windows, they may run on other platforms but that isn’t important to me. Secondly, The test setup has to be repeatable. Lastly, there has to be a way to measure performance. In some cases this may be using perfmon(Windows Performance Monitor) or the tool may capture some meaningful metrics.

Database Hammer SQL Server Resource Kit
http://www.sqlmag.com/content1/topic/put-the-hammer-down/catpath/configuration

Not exactly free or cutting edge but was good a putting a load on SQL Server. Even though this is VERY old due to its simplicity it works pretty much on every version of SQL Server from 2000 on. Again, one of the downsides is capturing metrics. The other down side is finding the dang thing. If you have an MSDN subscription you can still download it or you can head over to your local used book store and hope the CD-Rom that came with the book isn’t toast.

Eye On SQL – Load Test
http://eyeonsqlloadtest.codeplex.com/

A basic tool to put a database under load. There are no predefined tests and it doesn’t gather more than the most basic metrics. It is simple to use and can be a replacement for something like Database Hammer

TPCBench
https://github.com/SQLServerIO/TPCCBench

This is a tool I wrote a while back that implements TPC-C. It is in C# so anyone that is familiar can work on the code and peek inside to see how I implemented my version. The down side is it only implements one test, TPC-C. and TPC-C has several limiting factors to it. The one I never liked was all the data generated was random, all of it. If it was a name it was a random string. This leads to a database that effectively has 100% carnality, and is also a very poor choice for testing database compression. My goal was to implement TPC-E, which overcomes all of these issues but is fairly complex and hard to get right. I fell back to Benchmark Factory and never moved forward with this tool.

Hammerora
http://hammerora.sourceforge.net/

This is a pretty simple to use tool an has a easy to understand GUI. It supports two benchmarks a TPC-C style OLTP benchmark and a TPC-H style OLTP benchmark. That is also one of the down sides, it only supports two benchmarks and both are a bit long in the tooth. It also supports various flavors of SQL Server, Oracle, MySQL and Postgres. This is a nice little bonus if you are testing like hardware and OS and want to vary the RDBMS. The last bonus for me personally is it is opensource. This means I can look at the code and see how each test is implemented.

DVD Store Database Benchmark
http://en.community.dell.com/techcenter/extras/w/wiki/dvd-store.aspx

This is a tool I’m currently working on to bridge the gap between TPC-C and TPC-E. It was written in house by Dell and now is developed and maintained by Dell and VMWare. It is more complex than TPC-C but not as complex as TPC-E. It is opensource which was a huge plus for me and allowed me to fork it over at https://github.com/SQLServerIO/DVD-Store-Database-Benchmark and tweak on it to meet my needs. I’ll have a full blog post on this soon explaining the benchmark in detail and what changes I have made to it. It is a staple for testing databases on ESX. It has a standard way for gathering metrics. The load generator is actually in C#. You can also do a full stack test since it will work against a website that is also included in the benchmark. The down side again, is it only implements one benchmark.

Description of the Replay Markup Language (RML) Utilities for SQL Server
http://support.microsoft.com/kb/944837

This is a tool set that was initially developed in house at Microsoft to test customer scenarios. This is different than the other tools since it is a replay tool. But you can capture metrics and vary your workloads. The big down side is it isn’t easy to use. Also, if you are testing new hardware and don’t have a trace to replay from production you are back to zero.

Apache JMeter
http://jmeter.apache.org/

Not a database stress tool in the strictest since of the word but you can use it that way. The upside for JMeter is it will test your whole stack, web servers, middle tiers and database servers. You can use it to drive pretty much anything. Since it is written if Java it runs on Windows. Since there is a JDBC driver for SQL Server you can use it directly to test your database as well. The down sides are also huge. It is written in Java for starters. It is also using the JDBC stack and that can be a limiting factor. It is a significant undertaking to setup as well. I’ve included some links to people that have set it up to test databases and a ASP.NET web site.

http://ilkinbalkanay.blogspot.com/2010/03/load-testing-relational-databases-with.html
http://blog.technicallyworks.com/2009/06/load-testing-aspnet-sites-with-jmeter.html
http://blog.technicallyworks.com/2009/06/load-testing-aspnet-with-jmeter.html

Now What?

Well, that is easy, start reading, building and testing. I personally use something like Eye On SQL or Hammerora to do an initial smoke test or break in. Then move over to the more complicated benchmark tools. I have pretty much settled on the DVD Store benchmark for now as my OLTP testing tool. Now I need to find something more up to date than the TPC-H test everyone else has implemented so I can round out with a solid OLAP benchmarking tool. Maybe I’ll write something again!

Speaking at PASS Summit 2012

It’s Not A Repeat

Speaking at the PASS Summit last year was one of the highlights of my career. I had a single regular session initially and picked up an additional session due to a drop in the schedule. Both talks were fun and I got some solid feedback.

The Boy Did Good

I won’t say great, there were some awesome sessions last year. I did do well enough to get an invite to submit for all the “invite only sessions”. I was stunned. I don’t have any material put together for a half day or a full day session yet and the window to submit sessions was a lot smaller this year. But I do have three new sessions and all of them could easily be extended from 75 minutes to 90 minutes. So, I submitted for both regular sessions and spotlight sessions and got one of both! WOO HOO!

The Lineup

I’ll be covering two topics near and dear to my heart.

How I Learned to Stop Worrying and Love My SAN [DBA-213-S]
Session Category: Spotlight Session (90 minutes)
Session Track: Enterprise Database Administration & Deployment

SANs and NASs have their challenges, but they also open up a whole new set of tools for disaster recovery and high availability. In this session, we’ll cover several different technologies that can make up a Storage Area Network. From Fibre Channel to iSCSI, there are similar technologies that every vendor implements. We’ll talk about the basics that apply to most SANs and strategies for setting up your storage. We’ll also cover SAN pitfalls as well as SQL Server-specific configuration optimizations that you can discuss with your storage teams. Don’t miss your chance to ask specific questions about your SAN problems.

I’ve built a career working with SAN and System Administrators. The goal of this session is to get you and your SAN Administrator speaking the same language, and to give you tools that BOTH of you can use to measure the health and performance of your IO system.

 

Integrating Solid State Storage with SQL Server [DBA-209]
Session Category: Regular Session (75 minutes)
Session Track: Enterprise Database Administration & Deployment

As solid state becomes more mainstream, there is a huge potential for performance gains in your environment. In this session, we will cover the basics of solid state storage, then look at specific designs and implementations of solid state storage from various vendors. Finally, we will look at different strategies for integrating solid state drives (SSDs) in your environment, both in new deployments and upgrades of existing systems. We will even talk about when you might want to skip SSDs and stay with traditional disk drives.

I’ve spoken quite a bit on solid state storage fundamentals this time around I’ll be tackling how people like myself and vendors are starting to mix SSD’s into the storage environment. Where it makes sense and where it can be a huge and costly mistake.

Finally

I hope to see you at the Summit again this year! Always feel free to come say hi and chat a bit. Networking is as important as the sessions and you will build friendships that last a lifetime.