Another day in the Maelstrom of Badly Designed Software, this time involving not just an operating system function but also boot firmware. Geez, if you can’t trust your boot firmware, what can you trust?
This all began when the server which runs our DVR software (Sage; a great open source product — check it out at the community forum) began acting rather oddly. Truth to tell, this had been going on for months, but since we’ve switched about 90% of our TV watching to streaming services it was low on the priority list of things to fix. But since I had the great idea of saving my back and knees by relocating the server out of the hall closet to a little nook upstairs, I thought I might as well figure out what was wrong.
That ended up being a huge time sink and hair-puller. As my dad always said, it’s the five minute jobs that take two hours. Unfortunately, with computers the ratio is often 100 or more to 1, not 24.
I thought the problem was that one of the hard drives comprising a Windows Storage Space array was failing. No problem; I’d had to replace a failing drive once before (I’m looking at you, Seagate — can you please up your quality control??) and while it’s time consuming, it’s pretty straightforward.
The basic concept behind Storage Spaces is that you assign a bunch of individual drives to a pool and Windows virtualizes them into One Big Honking Drive. With built-in redundancy and error-checking. Expandable at will. Replaceable at will. Or so I thought.
Turns out things don’t go so well when (a) more than one drive fails at the same time (really, thanx again, Seagate!!) and (b) you’ve used up all the SATA ports on your motherboard. Kinda hard to “just add another drive” when there’s nothing to hook it to, and if you can’t add another drive, Storage Spaces won’t let you gracefully degrade the pool (e.g., shift whatever’s on the failing drives to the good drives). So even though my pool had enough space available to hold all the data on its good drives, I was stuck. Gotta be able to add in order to remove. Bizarre.
The resulting confusion and hair-pulling ultimately lead to me copying what files I could out of the pool onto a new (Western Digital) drive hooked up to an add-in SATA card I installed. The net result was the total loss of my desktop system’s file history (the server also plays that role), various backups of other systems, and about 75% of our recorded TV shows. Fortunately that was Really Bad rather than Unbelievably Disastrous since, as I mentioned, we don’t use our DVR much anymore.
Because no Really Bad Computer Day is complete with just one set of problems, I also had to fight with the Gigabyte P55 USB3 motherboard powering the server. It turns out that if the boot process can “see” a hard drive, but can’t identify it, it just stalls. Without any message or beep code or alert of any kind. And either one of the built-in SATA ports is flaky or they have to be “consumed” in a particular order (e.g., master before slave on a given channel), so… It’s disturbing to plug drives in and have them work, only to plug the same drives in to different ports and have the system freeze. With no hint as to what’s wrong.
Now, space is admittedly at a premium for firmware, so it’s not like it can contain a robust error reporting system. OTOH, modern firmware does contain a lot of stuff, including a number of messages. Would it really have been so hard to include “Uh, drive seen but not recognized on SATA port X”? Besides being really helpful, not having such messages violates what I consider to be one of the most important rules of well-designed software: don’t leave the user hanging. Log something, somewhere — screen, log file, carrier pigeon, the location doesn’t matter (so long as it’s known).
There’s nothing worse than trying to figure out a problem with no information as to what it is. It forces you to go into trial and error mode, also known as Keep Moving Everything Around Until It Mysteriously Starts Working Again. Not a pleasant experience, and not one that anyone should have to experience…so long as the software is well-designed.
The morals of the story? A few:
- If you use Windows Storage Spaces, always leave some unused hard drive ports available in your system.
- Better yet, think really hard about using Windows Storage Spaces without a full-time IT staff (I’ve abandoned it based on this experience).
- If your motherboard appears to freeze during the early stages of the boot process, consider that it may be having problems recognizing hard drives but is too ashamed to let you know that.