Monthly Archives: May 2010
Having been married to Microsoft for most of my professional career doesn’t mean I drink the Kool-Aid.
I have had the distinct privilege to grow up in interesting times. I loved DOS. As a BBS operator DOS was the de facto OS for most BBSes that ran on x86 hardware. Combined with QEMM/DESQview I was a multitasking fool, running many nodes on a single 386 and a ton of ram, 8 Megabytes to be exact.
Other OSes came and I tried them as well Even running OS/2 for a while. It was DOS compatibility and multi-instance that I was after, though you could run Windows 3.x apps in it, why bother.
I just didn’t see where Windows was anything near as powerful as my good old DOS prompt. I had used GUI’s before and knew that some day it would be the way things went. To put it bluntly though, I hated Windows. I my eyes at the time, it did nothing well. It made my sleek powerful machine run like a pig. It required me to learn how to do things the Windows way, which slowed me down. I even went so far as to actively refuse to own or install a mouse I so loathed Windows.
In many aspects my opinion hasn’t changed much. To be honest I blame the “over promise, under deliver” method of development that Microsoft seems to employ with Windows OS development.
Windows 3.11 for Workgroups was modestly noteworthy in my view because it help bring the internet into homes for the first time. I also knew the internet was awesome and powerful but I didn’t grasp the whole World Wide Web thing immediately ether. Not being a graphical guy I didn’t see what it bought me over any other tools that ran on the internet, until I really saw it running on Windows.
It still wasn’t enough to completely win me over. I was already working with GNU/Linux going back to DOS/Windows mostly to play games or develop on as a platform since that paid the bills. I had been using NT for quite a while as a systems/network/database administrator but still ran Linux at home and for other projects when I could. That changed a bit with the release of Windows 2000. To me it was Windows all grown up. 32 bits, nice GUI, fairly stable it had a lot going for it. Plus, Windows Me was such a miserable experience it was an easy choice to go with 2000 Workstation and pretend that particular nastiness just didn’t exist. Though it wasn’t until XP hit that I switched full time for my day to day desktop to Windows, from dual booting just XP all the way.
That lasted for quite a while…. Until Vista. Vista didn’t have a marketing problem, it didn’t suffer from bad press. It was just fundamentally broken. I don’t care how many Mojave commercials you run, live with it for a while and you will be just as unhappy with it if it was named Santa Clause.
I do thank MS for releasing Vista though, it turned me back on to Linux and specifically Ubuntu which I have been using now for the last couple of years. If anyone can close on Windows I think Ubuntu and Mark Shuttleworth have the best chance, unless Apple looses its mind and releases OSX for white boxes.
In the interest of full disclosure, I will always give Windows its due when it comes to ease of configurations and common usage. If it wasn’t for Windows my mom would still be putting stamps on her mail to me.
With that little history lesson, and my obvious bias against Windows I still always try the latest and greatest for MS. It is in my best interest to do so. I don’t want to ever be too far behind the curve, or have a lack of something to complain about.
So, with all my gripes and soap-boxing, here I sit typing away on a x86 machine with Windows 7 loaded on it, and I’m happy with it. So happy I’m not dual booting into anything at the moment and my laptop has it loaded as well.
Why, you may ask, am I back on the bandwagon? Here is the short list.
That’s right, as bad as Vista was, 7 doesn’t show any signs of the past sins. My first big ugh moment with Vista was trying to copy files on the network. It just wouldn’t start, or if it did it took forever to finish. I know it was addressed in a patch and later by SP1 but it was a band-aid on a sucking chest wound. Rarely would I come close to gigabit speeds even though I’m on a managed switch and both ends can easily handle the load. XP came much closer, and if I wasn’t using Samba, Ubuntu just flew over the wire. Windows 7 brought that back in line. When I got near wire speed on my first test run I just assumed it was wrong. I still doesn’t handle lots of small files as well as my Ubuntu setup but its not enough to quibble about.
Vista had them but at the cost of making your state of the art machine run like last years eMachine you bought for your mom for 300 bucks. On the other hand Ubuntu with Compbiz was just stunning and ran on my older Pentium M laptop with a radeon x200 mobile GPU in it. Again, 7 addresses this it keeps the visuals from Vista and improves on them, I got to say the rotating wallpapers is my current favorite feature at the moment. It is still a generation behind Compbiz as far as raw visual stunning effects.
I’ll never forget when a friend of mine was going on about Aero glass and transparencies in Vista all I did was break out my laptop and tab through the running apps. Once he picked his jaw up he asked how I had gotten Vista to do that….. After he got over the second shock, that it was Linux, I had him trying Ubuntu for himself.
I’ve also attempted to use Stardock to get as close to the same effects on windows and just had to give up. There was enough annoying crashes and blips to make it not worth my time.
I wasn’t sure I was going to like the new fat bar but it has quickly grown on me. I hate having a million icons on my desktop but I want things to be accessible that I use day in and day out. With Windows 7 replacing the quick launch with the ability to pin an application to the bottom bar, or in the start menu, you get the best of both worlds you task bar shows you what is running it also acts as your quick launch and it is remarkably uncluttered.
I am also a fan of the mouse over preview that shows you how many things you have open per group and what is in them i.e. having multiple browsers open or multiple management studio sessions. With the quick preview I can just peek and pick the one I need to work with now without having to alt-tab through everything.
The focus and fade effect you get when you mouse over then up onto the preview showing you only that window on the screen is also a nice touch. I use to always use the minimize all windows using the shortcut on the quick launch bar, then alt-tab through the list of running programs to find the one I was after it sucked but it was fast enough.
Out of the box I had very few driver issues with 7. It even installed without my help on my Nvidia raid array. There are a couple of drivers missing for my laptop but no real show stoppers. Since Vista took the brunt of that attack I’ll chalk it up as a win in that column for Vista.
Don’t laugh I mean it. 7, even as a beta and now RC has a better, more polished security model. Not the open range XP was and not the heavy handed style of Vista.
Just to make other Ubuntu/Linux junkies upset I don’t think it is any more disrupted as having to execute under sudo to install components or do administrative actions.
I do wish there was a bigger push to move stuff out of the kernel space and into user land for security and stability but I think time will fix these issues as well.
I still hear you snickering from the above topic but I must push on. Other than the 1.5 BILLION reboots to install software or update drivers I haven’t had any real issues with crashing.
The compatibly run as model actually worked for me on a couple of apps that didn’t play well under 7, but did just fine on Vista. Also, the fact you can install an application in this mode made life easier to the legacy stuff I have to have.
Another thing that will make the OSX guys upset is I haven’t rebooted my laptop since the install was completed. Hibernate actually works and that is the mode I leave it in. On my new laptop with 4GB of ram and a decent SSD drive it comes back from hibernate in a flash(no pun intended, oh hell who am I kidding of course it was intended). I was pretty much guaranteed that If I put Vista into hibernate it was about a 1 in 3 chance that I’d have to ditch the saved image and reboot clean.
This all adds up to a better user experience and enhanced productivity without a steep learning curve. I don’t feel like this was rushed out the door and then crammed down our collective throats as the pentacle of operating systems.
If you haven’t tried it, do so. I think you will be pleasantly surprised.
We got something good in the mail last week!
Some quick observations:
The build quality is outstanding. Nothing cheap at all about this card. The engineering that has gone into this shows in every way.
It is made up of modules that are screwed down, I can see where they really thought this through so each rev of the card doesn’t require all new PCB’s to be manufactured.
It does require an external source of power via 4 pin Molex or SATA power connector period. Make sure your server has one available, even though these are sold by HP not all HP servers have the required connectors.
PCIe expander bays are few and far between. The issue is most of these are used to expand desktops, laptops or used in non critical applications mostly AV or render farms.
This is a nice chassis but they are currently being retooled and won’t be available for a month or so. It is the only 1U and it has redundant power.
It exposes two drives to the OS per card. We will initially configure them two per machine in a RAID 10 array for redundancy.
More to come!
I do more than just SQL Server. I enjoy programming. In my former life I have worked with C/C++ and Assembler. As I spent more and more time with SQL Server my programming took a back seat career wise. Having that background though really helps me day in and day out understanding why SQL Server does some of the things it does at the system level.
Fast forward several years and I’ve moved away from C/C++ and spent the last few years learning C#.
Now that I work mostly in C# I do look up solutions for my C# dilemmas on sites like http://www.codeplex.com and http://www.codeproject.com. I love the internet for this very reason, hit a road block do a search and let the collective knowledge of others speed you on your way. But, it can be a trap if you don’t do your own homework.
I write mostly command line or service based tools these days not having any real talent for GUI’s to speak of. Being a person obsessed with performance I build these things to be multi-threaded, especially with today’s computers having multiple cores and hyper threading it just makes since to take advantage of the processing power. This is all fine and dandy until you want to have multiple threads access a single file and all your threads hang out waiting for access.
So, I do what I always do, ask by best friend Google what the heck is going on. As usual, he gave me several quality links and everything pointed to the underlying file not being set in asynchronous mode. Now having done a lot of C++ I knew about asynchronous IO, buffered and un-buffered. I could have made unmanaged code calls to open or create the file and pass the safe handle back, but just like it sounds it is kind of a pain to setup and if you are going down that path you might as well code it all up in C++ anyway.
Doing a little reading on MSDN I found all the little bits I needed to set everything to rights. I set up everything to do asynchronous IO and I started my test run again. It ran just like it had before slow and painful. Again, I had Mr. Google go out and look for a solution for me, sometimes being lazy is a bad thing, and he came back with several hits where people had also had similar issues. I knew I wasn’t the only one! The general solution? Something I consider very, very .Net, use a background thread and a delegate to keep the file access from halting your main thread, so your app “feels” responsive. It is still doing synchronous IO. Your main thread goes along but all file access is still bottle-necked on a single reader/writer thread. Sure, it solves the issue of program “freezing” up on file access but doesn’t really solve the problem of slow file access that I am really trying to fix.
I know that SQL Server uses asynchronous un-buffered IO to get performance from the file system. I did some refresh reading on the MSDN site again and struck gold. Writes to the file system may OR may not be asynchronous depending on several factors. One of which is, if the file must be extended everything goes back to synchronous IO while it extends the file. Well, since I was working with a filestream and a newly created file every time I was pretty much guaranteeing that I would be synchronous no matter what. At this point I dropped back to C++. I started to code it up when I realized I was doing things differently in my C++ version.
I was manually creating the file and doing an initial allocation growing it out to the size the file buffer and the file length on close if need be.
I started up my C++ version of the code and watched all the IO calls using Sysinternal’s Process Monitor. I watched my C++ version, and lo, it was doing asynchronous IO in the very beginning then switching to synchronous IO as the file started growing. I fired up my instance of SQL Server and watched as the asynchronous IO trucked right along…. until a file growth happened and everything went synchronous for the duration of the growth.
So, taking that little extra knowledge I manually created my file in C# set an initial default size and wouldn’t you know asynchronous IO kicked right in until it had to grow the file. I had to do a little extra coding watching for how much free space was in the file when I get close I now pause any IO, manually the file by some amount and then start up the writes again keeping things from going into a synchronous mode without me knowing.
So, there you go my little adventure and how my old skills combined with knowing how SQL Server works helped me solve this problem. Never assume that your new skills and old skills won’t overlap.
It doesn’t happen often but every once in a while you may be the lucky person to find a previously unknown bug in SQL Server.
It was a normal morning for me, checking the status of our servers going over any failure messages waiting for the day to ramp up. That’s when one of our lead developers came around the corner and told me he had an error when he had tried to create an index on a table he was working on. The more he tried to explain the error the more I started to worry. I had him send me the code and the error statement.
Expression: bufferLen > currOffset + ACCESSSOR_OVERHEAD
Process ID: 5016
Msg 3624, Level 20, State 1, Line 2
A system assertion check has failed. Check the SQL Server error log for details. Typically, an assertion failure is caused by a software bug or data corruption. To check for database corruption, consider running DBCC CHECKDB. If you agreed to send dumps to Microsoft during setup, a mini dump will be sent to Microsoft. An update might be available from Microsoft in the latest Service Pack or in a QFE from Technical Support.
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
I had what we like to call in the high availability space a “pucker moment”. This wasn’t your normal, I typed something wrong and got an error, kind of problem. This was a real SEV 20 with an assert, the core engine had just puked on something it shouldn’t have.
Like all good DBA’s the first thing I did was run a DBCC on the database this error was generated from.
While that was going on I asked my very good friend, Google, if he had seen this particular assert before. For the first time in a very long time Google failed me! In the last few years if I hit this kind of hard error someone else has too and it is ether being fixed in a hot fix or addressed in the next version of SQL Server, but not this time.
So, we have this same schema on another server and the developer tried the exact same code there and had the exact same error.
I had him document the steps he took to get to this point and to his credit the steps were clear, concise and easily reproducible.
The DBCC finished with zero problems detected, which let me calm down a bit. That coupled with the fact it looked like I had a repeatable test case When the second database had cleared the DBCC I set about my task of reproducing the error and trying to find a work around. Lucky for us it was a simple matter of column organization in the index and we were able to apply it successfully and carry on with life.
I bundled up the documentation I had accumulated, ran the test case confirmed the bug and sent it off to the powers that be at Microsoft. Since we had a work around and it wasn’t a show stopper I didn’t raise it as a critical server down issue but Microsoft still worked it in a timely fashion.
So, what was the problem you say? It was an interesting edge condition.
We have a table that contains a composite primary key and the rest is made up of bit fields, a flag table.
We had to add a new column, another bit flag, to the table.
The non-clustered covering index was dropped the column was added to the end of the table.
The index was updated with the new column at the end of the column list and then *POOF* it blew up.
I think it has to do with two specific things.
First, bit fields are stored in a compact manor where multiple bits share a byte and aren’t truly separate from every other but field. It would be a huge waste of space to store each bit in it’s own byte but it would make things like this index issue less likely to happen.
Secondly we did a column add but didn’t drop and recreate the table repopulating it in the process so things at the page level weren’t nice and neat. The underlying clustered index wasn’t effected but when we tried to add an index back with the new field it couldn’t do it. The fix was simple, change the column order in the non-clustered index moving the new column up one. We verified the data without the index was correct and with the index was correct.
I haven’t tried it yet, but I am betting included columns won’t suffer the assert ether since the items don’t have to be sorted in the index.
So there you go! Having been on the software side of things a couple of times I always find it interesting when I find bugs in others products and work the issue to conclusion.
What is your take away from all of this? Never be afraid to submit a bug report to Microsoft. I have heard people say to the effect someone else will or has hit the bug and they will submit it. DON’T RELY ON THE ACTIONS OF OTHERS! Reporting bugs helps the community as a whole and makes the product better. When you assume someone else is worked it you are putting YOUR production servers in the hands of strangers. If someone has submitted it and it didn’t turn up in a search they will let you know, and be very kind about it to boot. You will get piece of mind that it is being worked on and it is a bug, or that you may keep someone else from stumbling onto this and not having the knowledge to fix it or work around it.
I’ve often described SQL Server to people new to databases as a data pump.
Just like a water pump, you have limited capacity to move water in or out of a system usually measured in gallons per hour.
If you want to upgrade your pumping systems it can be a two fold process, the physical pump and the size of the pipes.
Our database servers also have several pumps and pipes, and in general you are only as fast as your slowest or narrowest pipe, hard drives.
To feed other parts of the system we have resorted to adding lots and lots of hard drives to get the desired IO read/writes and MB/sec throughput that a single server can consume.
Everyone is familiar with Moore’s law (Often quoted, rarely understood) loosely applied says CPU transistor counts double roughly every 24 months. Hard disks haven’t come close to keeping up with that pace, performance wise.
Up until recently, hard drive capacity has been growing almost at the same rate doubling in size around every 18 months (Kryder’s Law). The problem isn’t size is speed.
Lets compare the technology from what may have been some folks first computer to the cutting edge of today.
|IO/Sec||11.4 IO/Sec||303 IO/Sec||26x|
|CPU Speed||8088 4.77Mhz (.33 MIPS)||Core i7 965(18322 MIPS)||5521x|
*These are theoretical maximums in the real world you mileage may vary.
I think you can see where this is going. I won’t go any further down memory lane lets just say that some things haven’t advanced as fast as others. As capacity has increased the speed has been constrained by the fact hard disks are just that, spinning disks.
So, what does this little chart have anything to do with SSD? I wanted you to get a feel of where the real problem lies. It isn’t capacity of hard drives it’s the ability to get to the data quickly. Seeks are the key. SSD’s have finally crossed a boundary where they are cheap enough and fast enough to make it into the enterprise space at all levels.
SSD compared to today’s best 15k.2 HDD from above.
|IO/Sec||303 IO/Sec||35000 IO/Sec||115x|
So, in the last few years SSD has caught up and passed HDD on the performance front by a large margin. This is comparing a 2.5” HDD to a 2.5” SSD. This gap is even wider if you look at the new generation of SSD’s that plug directly into the PCIe bus and bypass the drive cage and RAID controller all together. HOT DOG! Now we are on track. SSD has allowed us to scale much closer to the CPU than anything storage wise we have seen in a very long time.
Since this is a fairly new emerging technology I often see allot of confused faces when talking about SSD. What is in the technology and why it has now become cost effective to deploy it instead of large raid arrays?
Once you take out the spinning disks, the memory and IO controller march much more to the tune of Moore’s law than Kryder’s meaning cost goes down, capacity AND speed go up. Eventually there will be an intersection where some kind of solid state memory, maybe NAND maybe not, will reach parity with spinning hard drives.
But, like hard drives not all SSD’s are on the same playing field, just because it has SSD printed on it doesn’t make it a slam dunk to buy.
Lets take a look at two implementations of SSD based on MLC NAND. I know some of you will be saying why not SLC? I’m doing this to get a better apples to apples comparison and to put this budget wise squarely in the realm of possibility.
Intel x25-M priced at 750.00 for 160GB in a 2.5” SATA 3.0 form factor and the Fusion-io IoDrive Duo 640GB model priced at 9849.99 in a PCIe 8x single card.
|Drive||Capacity in GB||Write Bandwidth||Read Bandwidth||Reads/sec||Writes/Sec||Access Latency (seek time)||Wear Leveling
|Cost per Unit||Cost per GB||Cost per IO Reads||Cost Per IO Writes|
* This is an estimate based on this article http://techreport.com/articles.x/15433. Intel has stated the drive should be good for at least 1 petabyte in write operations or 10,000 cycles.
Both of these drives use similar approaches to achieve the speed an IO numbers.They break up the NAND into multiple channels like a very small RAID array. This is an over simplification but gives you an idea of how things are changing. It is almost like having a bunch of small drives crammed into a single physical drive shell with it’s own controller a mini-array if you will.
So, not all drives are created equal. In Intel’s defense they don’t plan the X25-M to be an enterprise drive, they would push you to their X25-E which is an SLC based NAND device which is more robust in every way. But keeping things equal is what I am after today.
To get the X25-M to the same performance levels it could take as few as 4 drives and as many as 55 depending on the IO numbers you are trying to match on the IoDrive Duo.
Wear leveling is my biggest concern on NAND based SSD’s. We are charting new water and really won’t know what the reliability numbers are until the market is aged another 24 to 36 months. You can measure your current system to see how much writing you actually do to disk and get a rough estimate on the longevity of the SSD. Almost all of them are geared for 3 to 5 years of usability until the croak.
At a minimum it would take 10 X25-M drives to equal the stated longevity of a single IoDrive Duo.
Things also start to level out once you factor in RAID controllers and external enclosures if you are going to overflow the internal bays on the server. That can easily add another $3000.00 to $5000.00 dollars to the price. All the sudden the IoDrive Duo really starts looking more appealing by the minute.
What does all this mean?
Not all SSD’s are created equal. Being constrained to SATA/SAS bus and drive form factors can also be a real limiting factor. If you break that mold the benefits are dramatic.
Even with Fusion-io’s cost per unit it, is still pretty cost effective in some situations like write heavy OLTP systems, over other solutions out there.
I didn’t even bother to touch on something like Texas Memory System’s RamSan devices at $275000.00 for 512GB of usable space in a 4U rack mount device the cost per GB or IO is just through the roof and hard to justify for 99% of most SQL Server users.
You need to look closely at the numbers, do in house testing and make sure you understand your current IO needs before you jump off and buy something like this. It may be good to also look at leveraging SSD in conjunction with your current storage by only moving data that requires this level of performance to keep cost down.
If this article has shown you anything it’s technology marches on. In the next 6 to 12 months there will be a few more choices on the market for large SSD’s in the 512GB to 2TB range by different manufacturers at ranging prices making the choice to move to SSD even easier.
Recently, Microsoft Research in early April published a paper where they examined SSD and enterprise workloads. They don’t cover SQL Server explicitly but they do talk about Exchange. The conclusion is pretty much SSD is too expensive to bother with right now. To agree and disagree with them it was true several months ago, today not so much.
The fact that the landscape has change significantly since this was published and will continue to do so, I think we are on the verge of why not use SSD instead of do we really need it.
With that said please, do your homework before settling on a vendor or SSD solution, it will pay dividends in not having to explain to your boss that the money invested was wasted dollars.
A little light reading for you:
Often I tell clients better to much memory than too little. This can be applied to any database engine essentially. If your data set is growing over time you will end up using any memory that is not consumed today.
I’m here to tell you I don’t think it is the biggest consumer of memory in the Microsoft software catalog, There is a new champion! Introducing in this corner SQL Server 2008 x64, and his opponent Exchange 2007 SP1!
It isn’t even a contest, Exchange by a knock out.
The Exchange team have made some major changes in the core engine to blow the lid of how much memory it will use.
One of the things I remind people of from time to time is that Exchange is also a database of sorts and shares some of the common performance bottlenecks that SQL Server has. As a matter of fact, I saw the magic of sector alignment for file systems flogged on the Exchange side of the house long before it became more common knowledge on SQL Server. Now they have leapfrogged us again.
I live and breath SQL Server but I also dabble in other Microsoft stuff time to time mostly out of need or curiosity. This particular instance, I was setting up exchange for my private business.
Basically what they have done is make Exchange a 64 bit only application and lifted any memory restrictions on it what so ever. So, if there is 8GB on the server it will start up and gobble up that memory like it was cotton candy and happily beg for more. This was kind of a shocker to me, a SQL Server administrator who is use to having boxes with 32GB of ram or more for SQL Server and not see it utilized until the database engine has determined that it needs a particular piece of data cached in memory.
Where Exchange differs it assumes, from what I can glean, that it will need EVERY piece of data (i.e. peoples mail) at some point so it might as well load as much as it can now.
Now I can hear you saying “So what? Exchange should run on it’s own server anyway.” and you would be right, kind of. I host my enterprise stack on a single machine running Hyper-V Server. Realize this machine only has to support about 10 users so I sized it at 2 quad core Xeons, and 16 GB of ram initially, knowing I could grow the memory as needed or even move virtual machines off onto a new server with very little pain. Microsoft and others are touting this as the way of the future, no better way to learn about it than by doing.
I assumed since I was only running a few mail boxes that Exchange wouldn’t need all that extra ram and I could give it to my SQL Server instance. Boy, was I wrong. So, I gave it the minimum 2GB it states in the requirements. I soon discovered that was like running Vista on 512MB of ram, sure you could do it if you didn’t mind going and getting a sandwich every time you clicked the mouse button.
I thought to myself “WOW, this thing is a real pig!” I looked at the size of all my data stores, a paltry 758MB, and wondered why 2GB wasn’t enough. As I type this I’m still not sure but I found a work around. Where in SQL Server you can easily tell it don’t dominate the memory by setting the min and the max available to it Exchange has no easy way (in the GUI) to do this. I eventually found a registry setting for how much memory the store (database engine) would take and set it to 1GB. And wouldn’t you know it, Exchange runs just fine that way. It doesn’t page to disk. It isn’t slow to send or receive mail. It didn’t cause the end of western civilization as we know it.
Finally, after years of telling people to get to be good friends with your Exchange admin.
They have more in common with you than you realize and may be able to help work with your SAN teams or your server admin teams on getting what you need for I/O or other hardware for your poor abused SQL Servers.
I can now tell the Exchange folks that they need to look at how SQL Server handles memory and data caching, you are running a mostly read only database with forward inserting rows(mail messages) and maybe, just maybe you shouldn’t cache every little thing right up front.
I’m not sure why the went with such an aggressive memory scheme, I am glad there was something I could do about it.
Who knew you needed a 64-way HP Superdome and 256GB of ram to really see Exchange shine?
One of the fun facts about SQL Server and the relation model is the whole concept of three valued logic. Now I’m not breaking any new ground here I am just touching on something that trips people up when they expect a result and don’t get it due to the magic of NULL’s. To be honest, I’m no exception to falling into the unknown from time to time.
Codd laid out 12 fundamental rules of what a relational database system should conform to if it is to be considered truly relational.
Rule 3: Systematic treatment of null values:
The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.
This rule, above all others has probably caused me the most heartburn over the years.
I’ve spent more time that I like to admit reading papers and books from pioneers of the relational model, E.F. Codd, C.J. Date. One of the great debates that carried on until the passing of Codd was over NULL and three valued logic in general. To give you an idea of how big and problematic NULL’s are, It’s been put forth that maybe we need to get rid of them all together or move deeper into the rabbit hole and make it 4 valued logic breaking NULL into different kinds of place holders for unknown.
Understand that every piece of data falls into a domain of some sort. 0 is a integer type. ‘’ is a empty string. NULL isn’t in any domain, or value type. It doesn’t represent something at all it is a place older period.
I’ve heard 3 valued logic described as yes/no/maybe but that isn’t accurate, it is true/false/UNKNOWN.
So the only logical thing that can happen to UNKNOWN is unknown. What’s even worse is UNKNOWN technically can’t be equal to UNKNOWN or NULL = NULL.
How do you know they are equal if they are both unknown?
select 1 where NULL = NULL
returns nothing since NULL can’t be equal to anything including NULL we don’t get a NULL back or the 1 back we tried to select.
select 6227 * 453 / 238 + NULL
which makes since on the surface to almost everyone I work with.
select NULL / 0
To some folks this is confusing in a traditional programming since anything divided by zero gives us an error of cannot divide by zero.
Since NULL is the place holder for UNKNOWN there is no way to evaluate the statement other than UNKNOWN or NULL!
This must also carry through for string manipulation as well.
select 'here' + 'is ' + NULL
Again it is the old how can you concatenate something to the unknown problem.
Now with all this in our little busy heads we finally think we understand the problem in it’s fullness, but we don’t (or I always don’t).
Where things can get really sticky is in any kind of aggregate situation SUM(), AVG(). Generally, all aggregations have a NULL elimination step built into them.
So lets say we have a table that looks like this:
(this isn’t an article on table design so don’t sweat the lack of a key or the duplicate data in both columns)
create table myNumbers
insert into myNumbers (Col001,Col002) VALUES (100,100)
insert into myNumbers (Col001,Col002) VALUES (200,200)
insert into myNumbers (Col001,Col002) VALUES (300,300)
insert into myNumbers (Col001,Col002) VALUES (NULL,300)
select avg(Col001) from myNumbers
select avg(Col002) from myNumbers
Warning: Null value is eliminated by an aggregate or other SET operation.
(1 row(s) affected)
(1 row(s) affected)
What happens when the evaluation of the aggregation occurs there is no way to evaluate the NULL so that row is dropped and all the sudden you numbers look horribly wrong.
If I hadn’t put this to output to text we may have merrily trucked along and accepted that 200 was what we were expecting and not the 255 we really wanted due to the treatment of the NULL.
Since it is a warning and not an error our program won’t complain ether it will be more than happy to accept the result since it is valid one.
The only exception to this in general is using COUNT() since you can count a place holder just like you would a value type it isn’t evaluating the data held just that a row exists whether we know what is is in it or not.
I’ve only just scratched the surface of the unknown with this little post. I haven’t covered grouping or JOIN’s and I may in a later post.
My goal is simply to remind myself that dealing with NULL is never a trivial matter.
"Any technology, no matter how primitive, is magic to those who do not understand it."
Arthur C. Clarke penned three laws of prediction
- When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
- The only way of discovering the limits of the possible is to venture a little way past them into the impossible.
- Any sufficiently advanced technology is indistinguishable from magic.
The last is the most well known. I love the change that the comic strip freefall made to it and felt that was most appropriate to my day to day dealings with people and technology in general.
On a regular basis, I hear people describe SQL Server as a black box or magic box and working with it in any real depth is an art or wizardry. That simply isn’t so. It is science 100%. Not to take away from the designers and developers of SQL Server and the people that push the boundaries on what it is capable of, but it is all based fundamentally on engineering principles,math (relational algebra in particular) and the underlying technology of computers.
Once you demystify it, break it town into small enough parts you can quickly master the parts of it that effect your life in a reasonable amount of time.
One of the areas I focus on is I/O performance and SQL Server. So, I’ll be doing a multi-part post covering the entire I/O stack from how a hard disk works through SAN’s and eventually how this all effects SQL Server.
Hopefully you will find it useful.
Here is to my blogging endeavor!