Wednesday, April 17, 2013

Microsoft and Apple - Still Not Quite Playing Nice

Apparently, it's a known issue that in Mac OS X Server 10.6+ environments (such as my new 10.8 setup), clients that attempt to send mail with MS Outlook may encounter some... difficulties. Specifically, an undeliverable message will appear like the following:
That is, "Helo command rejected: need fully-qualified hostname

Unlike a number of the various trials that this upgrade has sent my way over the last 48 hours or so, this one is at least fairly simple to fix. According to Apple's Support website (which is surprisingly helpful here), on a 10.8 Mountain Lion Server, the following procedure is appropriate:

  1. In /Library/Server/Mail/Config/postfix/main.cf, locate the smtpd_helo_restrictions setting.
  2. Remove "reject_non_fqdn_helo_hostname" from the list of settings.
  3. Restart the Mail service.

And then for 10.6-10.7 Server:

  1. In /etc/postfix/main.cf, locate the smtpd_helo_restrictions setting.
  2. Remove "reject_non_fqdn_helo_hostname" from the list of settings.
  3. Restart the Mail service.
I post this here because this provides an important clue to those who may find themselves scratching their heads and unable to find good information when attempting to find where Apple has hidden the various configuration files veteran server admins are used to hand editing. Between 10.7 and 10.8 the postfix configuration folder has moved from /etc to /Library/Server/mail. A cursory inspection reveals that within /Library/Server/Mail/Config mail related services including amavisd, clamav, dovecot, postfix and spamassassin all store their configurations here. Backing out to /Library/Server reveals the rest of the services provided by Mac OS X Server, like Web, FTP and the built-in PostgreSQL database engine (but not MySQL, which does not come preloaded with Mac OS X Server).

Now, I'm not entirely sure what I think about moving configuration files that have long been stored in other places. While it is nice that this brings some degree of standardization, experienced server admins may find the transitions difficult. Beyond that, it is somewhat nice to be able to easily browse to all the configuration folders without having to issue the "got to folder" command within Finder. Those who prefer to administer from the command line likely won't see that as an advantage, however.

Monday, April 15, 2013

Upgrading PHP5 on Mac OS X Server 10.5.8 - Error establishing database connection

I was greeted by the above error message this evening after the long task of installing dependencies and upgrading the installation of PHP5 on my Mac server to get some needed extra features (like the GD library). After the successful installation of the latest PHP5 for Mac OS X 10.5.8 (Leopard) I could no longer access any of the Wordpress sites installed on my server. A closer look revealed the specific error when attempting to load a site was "Can't connect to local MySQL server through socket '/tmp/mysql.sock'"

Evidently the new installation is looking in a different location for mysql.sock (in my case it was in /private/var/mysql/mysql.sock). The fix was relatively simple, create a symlink from where the file exists, to where it is expected to be:

First, find it:

locate mysql | grep sock

Mine:

/private/var/mysql/mysql.sock

Symlink it:

ln -s /the/path/to/mysql.sock /tmp/mysql.sock

Wala, done. What could have been a multi-hour-tear-my-hair-out-and-probably-end-up-rebuilding-the-whole-server issue was resolved quickly and elegantly, without the need of modifying any global parameters in any config files which may break the next time the software is updated, etc.

Now the image processing bits of a Wordpress plugin my wife was wanting for her blog are functional and life is once again good.

Monday, March 21, 2011

Simple Ideas: Modify a SATA enclosure to make data backups easier.

If you're like me, you do a lot of system reloads for people who have OS issues due to malware or other factors. As a part of nearly all reload procedures a data backup of the owner's documents, pictures, music, etc. is performed. I always give a client the option and try to steer them toward it since the chances of recovering important information inexpensively after a system reload become very slim. While it's not something I would expect many wouldn't eventually come up with, I decided I would streamline the process of doing this step by modifying a SATA drive enclosure so that it can be used with the system's drive withoutremoving the drive from the PC.

First, I ditched the wimpy power adapter that came with the enclosure. This is the weak point on almost all drive enclosures. I've probably had ten of these fail for every other problem I've encountered with SATA drive enclosures so I opted to replace it with a unit that outputs the same voltage but has a higher current capacity.


Nice and heavy. 4.16A heavy.

I chose an enclosure that has everything required on one board rather than lots of leads going off to switches and lights that are sometimes difficult to remove from the chassis, etc. I also found that having a power switch on this particular enclosure's board was handy and there's even eSATA for added speed (though I'm just using USB right now).



USB, eSata, power switch and connector.

SATA and SATA power connectors.

"Modify the enclosure" might not be the exact phrase to describe what I did - rather, I removed the board from the enclosure. That is all. This really isn't a terribly complex operation, you just undo some screws, bend some tabs, etc. and free the board from the enclosure. Again, this is easier to do and the finished product is easier to handle if everything is integrated and you don't have to bring along extra stuff from the enclosure to get it to work.

Now, this is exceptionally simple to put into use: all you have to do is disconnect the SATA and power cables going to the hard drive in the computer being repaired and dock this little guy to the back of the drive. A word of warning, however, the SATA connections on the backs of hard drives can be fragile. I would advise against putting any undue strain on the connector with this board. Fortunately, the board itself is light enough it puts less strain than stiff SATA cables can on the board and thanks to the built-in power switch I can keep the power off (green light is dark) until it is completely seated on the drive and then apply power with the flip of a switch, rather than trying to plug in a coaxial DC connector to the back of the board while keeping it securely docked to the drive. Now plug the USB (or eSATA) into your host machine and you're in business.

Serious business.

Be mindful of the cords while they are hooked up and don't make any tripping hazards or you might still damage the hard drive being worked on. A data recovery process can sometimes take hours so it's important to keep everything clean and simple so you're not accidentally jerking cords around. My data recovery station is one table that I keep an iMac on for performing the actual recovery and I do not string any cords across gaps.

In the end this can save you several minutes on each job and help prevent the possibility of accidentally dropping the customer's hard drive or losing it either on your bench (it's perfectly clean, isn't it?) or worse, in another customer's computer. I normally label all drives with permanent marker with the customer's name before it leaves my hand after removing it from a PC, but this saves all those steps and is especially helpful on systems where you have to jump through hoops like front panel and faceplate component removal are required to remove the drive.


Who, me?

In addition to merely removing the board and using it one could also insulate it somewhat to protect it from damage and protect other things that it might come into contact with while left plugged into it's power brick. One could simply wrap the board with tape, I have done this before, but it will leave a sticky mess if you ever choose to change your design. You could also make or repurpose a small enclosure that holds just the board and perhaps integrates some form of mechanical attachment to support the device on the back of a drive. Keep in mind, however, if you make it much bigger or add tight brackets to the front side you can limit the device's usefulness, so I prefer to keep things simple.

Wednesday, March 9, 2011

Post Electrical Storm Troubleshooting Residential Wireless Systems


I've noticed something of a recurring theme over the last few years. After each thunder storm, households which subscribe to wireless internet services tend to have problems that on the surface appear to be gremlinic in nature. The provider will normally replace the roof-mounted "antenna" (which is actually an integrated wireless transceiver and antenna array enclosed in a plastic clamshell) and perhaps the PoE injector located inside the dwelling near the point of entry or the customer's computer.

I'm in ur cables, carefully opening a single pair.

In a percentage of these situations the problem is not resolved and the customer is informed that his or her computer is to blame. Often, upon separate inspection by an experienced service technician, the user's computer doesn't exhibit signs of failure with regard to the ethernet port and network functionality may be verified by a battery of tests that check for packet loss during intensive network activity. The user often will now find themselves back at square one, not knowing exactly where their problem lies.


The service provider's misdiagnosis is often compounded if a representative of the ISP uses a laptop to "test" the connection, by plugging it into the customer's wireless transceiver and browsing the internet. A novice may feel that this "rules out" the installed equipment but it doesn't address some common scenarios.

Look, I know they sent me to fix your internet and all,just please don't tell my boss I got my hand caught in this thing.

In at least one situation I have observed service technicians dispatched by the ISP verify a connection with a laptop that subsequently was found to have static assignments for IP address and DNS servers. The customer's complaint was a failure to browse the internet which was related to an inability to acquire an IP address from the ISP. In that situation, the ISP-owned on-premise equipment's built-in DHCP server had failed and was no longer assigning appropriate data to connected clients. Replacement of that equipment resolved the issue.

Another scenario occurs when there is a no link status between the client PC and the wireless transceiver. A failure in the transceiver, intermediate cabling, PoE injector or network card/port on the client PC can cause this condition. A failure of ISP equipment in this scenario, particularly the cabling and PoE injector can manifest itself differently on different systems depending on the ethernet capabilities of that system. In multiple similar situations, a client PC with a 10/100 ethernet capability was observed unable to establish a link to the wireless transceiver but the service technician laptop, equipped with a 10/100/1000 (gigabit) ethernet capability was able to connect and browse with reasonable reliability. This is due to certain characteristics of gigabit ethernet that contribute to it's robustness. Users in this scenario may erroneously believe that their original network port is bad after installing a replacement network interface card (NIC) that is designed for gigabit functionality. The link at this point may work and appear to be stable, however this is a band-aid repair and a more in-depth inspection of premise equipment should be conducted by a qualified technician


A failure of almost any one of these pairs upstream of the PoE adapter, and up to two failures downstream, can be tolerated by a gigabit ethernet controller at the host end.

A customer encountering issues with their residential wireless internet service is advised to be aware of these special situations. There are more scenarios in which a service tech's troubleshooting laptop can function when connected to a faulty internet connection, than when a customer's ethernet port can function on a different network when it has failed.

Server Woes


The question I asked myself when I set up this external blog here at Blogger was "why not just make one on the server I host at home?". After all, I run WordPress there, and I am much happier with the way WordPress works as opposed to Blogger, I like the flexibility it brings. I like the fact that there are thousands of useful widgets for WordPress as opposed to, oh, about eight, useful widgets for Blogger, the rest being "see pictures of XYZ" where XYZ is a random celebrity, etc. Perhaps today was an indicator as to why I didn't just do that.

I woke up this morning early and after spending a couple hours on work projects my wife called me from the other room "I don't think my E-mail is working.". I checked, mine wasn't either, my personal E-mail anyway. I checked our various websites, concurrently with my wife and we both came to the same result - websites were down too. That means one thing really, server's down.

The server goes down more than it should, I believe. It's not helped by the fact that it's not really connected to clean mains power, I don't even have a UPS supporting it presently. It's also not helped by the fact that it's ancient, and not really server-class hardware. Here's a stock photo:
Holy crap, one of THOSE THINGS?

Yep. One of those things. It's a Mirrored Drive Door G4 Power Macintosh Tower. Affectionately referred to as an "MDD G4" by some, or "problematic concoction of steel and acrylic" by others.

I'm falling into the group of "others".

This morning, I was pleasantly surprised to find that the machine was actually running. I powered the monitor that I had left connected to it since I set it up a few weeks ago (yes, just a few weeks ago, this iteration is young compared to others I have put in place). I logged in and at the onset, functionality was aparent.

Functionality was nothing but actual, however.

I soon noticed some oddities. The Server Admin app, that launches each time the account is logged on complained "There is no server at the address specified.". "Localhost" I thought to myself, "there is a server at localhost".

I attempted to browse the web. "You are not connected to the internet" was Safari's unhelpful reply. Perhaps it wasn't too unhelpful. I examined the ethernet cord, it was firmly plugged in. I gave it a tug for good measure, and then gave it a full re-seat for better measure and tried reloading the page. No go.

I launched the system preferences app to see if I could get at the network configuration from there. System preferences was responsive, until I clicked Networking. Then, SBBOD.

I should mention that this was at about 9 in the morning. I was still in my underwear, I was cold, I was standing on concrete with bare feet and troubleshooting was NOT on my agenda for the day. I had deduced that the night prior, when my dishwasher tripped the same breaker that accommodates the circuit that this server is powered through, more than just a temporary power cut had occurred. Something terrible and hardware related and probably costly to fix.

Remember, I'm not all together by this time in the morning really.

I checked the router - no link light. It's looking more like hardware. I switched the cable to another port on the router, then to the switch I have next to the router, no light, no matter where it's plugged in. "Could it be I fried the lan port in the server?" I asked myself. I decided that this was probably the case and I went to my plan A.

Swap the tower!

I keep a spare unit, identical to this one, but it's a slightly different revision, for just this kind of situation. It has no ram or hard drives, so I swap in my installed server drives and the 2GB of DDR ram that this unit likes and any hardware failure, related to the bulk of the hardware - the power supply, board, cpu, etc. is instantly ruled out.

Problem is, it still didn't work. I swapped the shell, the psu, the board with the ethernet chipset along for the ride, and I got the same issue. Well, now it's certainly a software problem (I'm starting to get it together by now). I'm not interested, having not showered or dressed, to start banging away at a terminal trying to find out why the core processes that support networking weren't running because I had a plan B.

Time machine go!

I had recalled that I had configured Time Machine to backup the server installation to a partition on the first boot drive (don't worry, that's only a small part of my overall backup strategy, you'll see) so I popped in the Server Install disc and went to the Utilities menu...

Wait, did I say I "popped in the Server Install disc"? I'm sorry, it's just not that simple, at least on this MDD unit. This unit, and the spare that I keep, both have flaky optical drives. They're not the easiest thing to get at so I haven't replaced them yet, but if I did, I am not sure what I'd replace them with anyway. See, I have about a half dozen IDE optical drives laying around from various computers I've owned in the past. I have been through each of those drives and while they read discs on PCs just fine, they probably read on Macs too, some of them burn with relative success, none of the spare drives that I have on hand will actually boot a Mac from an install disc. Most will boot PCs, not sure why, but none will boot, at least these MDD G4s (which might have something to do with it). I do, however, have one drive, but I don't call it a spare because it's not. It's the drive from my desktop G5. It's IDE and it, out of all the drives in the house, actually will boot these two MDD units from install media. So, when I say "popped" really mean, I dug the G5 out from under my desk, I pulled the side panel off and fished out the drive from it's latch system, unplugged it, dug out an IDE cable, laid open the MDD unit and connected the G5 optical drive in a sidecar configuration, I THEN popped the server install disc in and booted the machine.

Once inside setup I went to the Utilities menu - I wasn't ready to conduct a reload, not yet anyway and it occurred to me, briefly, to check the drive and repair permissions, so I made a brief stop at Disk Utility. Massive permissions repairs, not unusual it didn't seem. I don't run these exercises on any sort of schedule and these systems are always screwing up their permissions, not a big deal. I did a reboot to see if that fixed anything - not this problem anyway, so I was back to the installer...

By the way, it's about 11:30 now, all this digging and booting optical media takes time.

This time, Time Machine was my destination. This for sure would rescue me! I chose the most recent backup and configured it to restore to the Server HD volume. An hour later and...

Still isn't working.

And back to Time Machine, this time, the OLDEST backup available! Damn the recent configuration changes, I can make them again! Another hour and...

It's STILL not working. This is why I say, "Always have a Plan C:"...

Attack it with a clone!

Remember when I said this build was only a couple weeks old? That's because prior to this build I had a fairly long running installation, I'd say about two years old, that died because of a power failure and/or a bad hard drive. I suspect the two were related, perhaps not. Who knows. When I replaced the hard drive in this server, I bought two drives and cloned my initial, mostly configured state, to the second drive and put it away. I thought with time machine and the other backup strategies I was employing I would probably be able to ride about anything. In that probably is where I found the reason to get a second drive.

Now, rapidly approaching 2:40 PM I swapped in my spare and I was off and running. I had a few configuration changes to make, and I did so. I had, since cloning this drive, placed the web server root folder and the mail queue and database on a separate higher capacity hard drive that wasn't effected in all of this. I changed the configuration to point to those locations, executed a very handy little mailbox repair tool I had found before when I was having some problems and I would say I was back up and running within 15 minutes of calling plan C into action.

Always have a Plan C.

Saturday, March 5, 2011

Transfer Windows 7 Hard Disk To A New PC

Or, replace a motherboard in a Windows 7 system without reloading. Either way, when you do this, you're effectively going to a "new PC" - at least as far as the software is concerned. Even if you're using the same ram, cpu, drives and chassis, a different motherboard is just about as good as a whole different system.

Now, there are two complaints about this process that I can hear now:

  1. You shouldn't do that - the OEM license forbids it!
  2. You shouldn't do that - reloading is BETTER!

Well, to #1, I'd answer "I don't care, you do what you gotta do to get up and running". Microsoft actually acknowledges this and doesn't censor it's help forums for users seeking ways to do this. They understand that there are times when you just don't have the time or ability to procure a new license and do a fresh build on a system, or that this process may be a needed intermediate step for a full proper repair, they get it, you should too. To #2, well, I know. Again, time and other considerations can make this procedure more feasible, at least in the short term, than doing a full reinstallation.

This particular situation put me in territory deeming it worth my time and effort to preserve the original installation, rather than reloading it. My wife uses this PC for work and her employers provide some extremely finicky, delicate and difficult to install and configure software. Not only that, but this software, which uses Microsoft Word, has never worked well with versions of Office newer than 2003, and, unless they've done some major work on it recently, does not work at all on 64bit installations of Windows.

Rather than dig out a 32bit Windows 7 install media, then hunt around and dig out the Office 2003 media, and go through the multiple-hours long process of installing the proprietary applications, and manually installing their prerequisites because the developers couldn't be bothered to include the redistributable packages for a half-dozen required technologies, and then deal with migrating my wife's Outlook files, and over 250 gigs of other datas, I took the "easy" way out and decided I'd try my luck with swapping the motherboard into the system as-is.

Now, first, I was smart, as outlined in another post I effectively duplicated her hard drive and got that prepared for the actual procedure, leaving her original untouched just in case we had to go back to the system on the flakey but still marginally functional motherboard for some emergency work or whatnot.

After dealing with some other issues, I was able to get the system to the completely expected state of stop coding on boot. This is an extremely common happenstance when performing this kind of migration. Windows systems that are NT based, probably going back to 3.51, but certainly including Windows 2000, XP, Vista and Windows 7, are heavily dependent on the proper hardware abstraction layer (HAL) and mass storage controller drivers being installed and presently selected by an installed OS. Should some factor that these depend on change, say, you go from being a standard uniprocessor PC to an ACPI Multiprocessor PC, that's a different HAL, and you're looking at a stop code. If you go from one storage controller to another, that's another stop code (0x0000007b). That's the code I was receiving, and that's what we'll concern ourselves with for this brief how-to.

There are a few factors that we've got to be able to verify before we can proceed. I was replacing a motherboard in a system that had a somewhat bad motherboard, but was otherwise still bootable. You MUST be able to boot the system, at least into safe mode, in order to perform these steps with the ORIGINAL hardware in place. I had her old motherboard and a power supply splayed out on the kitchen table because I did not want to go back through the physical build process. I booted the copy of her hard drive and placed the following lines in a new text document and saved it as mergeide.reg:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\primary_ide_channel]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="atapi"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\secondary_ide_channel]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="atapi"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\*pnp0600]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="atapi"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\*azt0502]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="atapi"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\gendisk]
"ClassGUID"="{4D36E967-E325-11CE-BFC1-08002BE10318}"
"Service"="disk"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#cc_0101]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_0e11&dev_ae33]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1039&dev_0601]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase \pci#ven_1039&dev_5513]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1042&dev_1000]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_105a&dev_4d33]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1095&dev_0640]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1095&dev_0646]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1095&dev_0646&REV_05]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1095&dev_0646&REV_07]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1095&dev_0648]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1095&dev_0649]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1097&dev_0038]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_10ad&dev_0001]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_10ad&dev_0150]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_10b9&dev_5215]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_10b9&dev_5219]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_10b9&dev_5229]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="pciide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1106&dev_0571]
"Service"="pciide"
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_8086&dev_1222]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="intelide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_8086&dev_1230]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="intelide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_8086&dev_2411]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="intelide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_8086&dev_2421]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="intelide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_8086&dev_7010]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="intelide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_8086&dev_7111]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="intelide"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_8086&dev_7199]
"ClassGUID"="{4D36E96A-E325-11CE-BFC1-08002BE10318}"
"Service"="intelide"

;Add driver for Atapi (requires Atapi.sys in Drivers directory)

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\atapi]
"ErrorControl"=dword:00000001
"Group"="SCSI miniport"
"Start"=dword:00000000
"Tag"=dword:00000019
"Type"=dword:00000001
"DisplayName"="Standard IDE/ESDI Hard Disk Controller"
"ImagePath"=hex(2):53,00,79,00,73,00,74,00,65,00,6d,00,33,00,32,00,5c,00,44,00,\
52,00,49,00,56,00,45,00,52,00,53,00,5c,00,61,00,74,00,61,00,70,00,69,00,2e,\
00,73,00,79,00,73,00,00,00

;Add driver for intelide (requires intelide.sys in drivers directory)

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\IntelIde]
"ErrorControl"=dword:00000001
"Group"="System Bus Extender"
"Start"=dword:00000000
"Tag"=dword:00000004
"Type"=dword:00000001
"ImagePath"=hex(2):53,00,79,00,73,00,74,00,65,00,6d,00,33,00,32,00,5c,00,44,00,\
52,00,49,00,56,00,45,00,52,00,53,00,5c,00,69,00,6e,00,74,00,65,00,6c,00,69,\
00,64,00,65,00,2e,00,73,00,79,00,73,00,00,00

;Add driver for Pciide (requires Pciide.sys and Pciidex.sys in Drivers directory)

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\PCIIde]
"ErrorControl"=dword:00000001
"Group"="System Bus Extender"
"Start"=dword:00000000
"Tag"=dword:00000003
"Type"=dword:00000001
"ImagePath"=hex(2):53,00,79,00,73,00,74,00,65,00,6d,00,33,00,32,00,5c,00,44,00,\
52,00,49,00,56,00,45,00,52,00,53,00,5c,00,70,00,63,00,69,00,69,00,64,00,65,\
00,2e,00,73,00,79,00,73,00,00,00
You should just copy and paste that right into a notepad window (you may need to fix any weird wordwrapping issues from pasting from the internet), click file, save as and be sure to select *.* All Files in the type pulldown so you can specify the .reg extension.

Now, on that system, still functional and booting, double-click that reg file. You will be presented with some warnings about importing registry data from an unknown source and whatnot, you'll want to confirm this. Ok anything you see (UAC prompts, etc.) until you receive a message stating that the information has successfully been imported into the registry.

You may now shut down and perform your hardware swap, whichever way you may, be it a new motherboard into this system, or this hard disk into a new system, either way.

Upon first boot it is completely normal for a lot of things to be very slow. Your mouse and keyboard may not be responsive at first and you may find yourself staring at a login screen wondering how you'll do it without any input devices. Give it some time. I had to wait several minutes before my mouse lit up and the numlock on the keyboard toggled. Several more minutes beyond that and the mouse became responsive and I was able to select the user account and log in. That's not all there is to it, btw, you now have to install drivers for everything. I'd recommend loading the LAN driver from the installation CD that came with your new motherboard and then go online to the boar manufacturer's website to get all new drivers (including that for the LAN) and loading them. Restart the system, run any windows updates, re-activate Windows, Office if it's installed and complaining about it's status, and you should be golden!

Thursday, March 3, 2011

Windows 7 Startup Repair - Not Quite Complete

Startup Repair is nice. Let me just get that out of the way first. I've used it many times to fix many problems, mostly related to the Boot Configuration Data, bootloader, etc. on both Windows Vista and Windows 7 systems. There seems to be, however, issues that Startup Repair can detect but cannot fix. The fixes for these problems aren't always inherently "impossible" from the standpoint of the environment in which Startup Repair runs. No doubt, there can and are many situations that prevent a Windows system from booting that startup repair cannot fix, nor can it be expected to fix. I encountered a situation today that I believe should be well within it's capability to repair.

System recovery options, including Startup Repair.

This morning I resumed a fix I had been working on yesterday for Jill's PC. I was working with a copy of her hard drive made by using Windows Home Server to restore the most recent backup copy of her hard disk from the day before. I had my reasons for doing it this way, in a nutshell they include: using a larger hard drive and not wanting to screw up her original. When restoring to a new hard drive you must prepare the drive and I happened to miss a step during this process. Essentially, when booting from the WHS client restore disc, you load a preinstallation environment of windows that starts the recovery wizard, installs lan drivers if needed, finds your home server and lets you pick which backed up PC and which backup of that pc to restore to which disk. You must, during this process, partition your new hard drive and set that partition as active. That last bit in bold is what I forgot to do. Failing to do that, the WHS restore application will not complain about there being something wrong with your restore destination and will happily spend the next four or five hours restoring 280GB of data over the network to this drive.

When attempting to boot the drive for the first time I encountered an error I didn't immediately expect. BOOTMGR IS MISSING. I fully expected, say, a Stop Error (I was replacing the motherboard at the time, the subject of another post). This message, again, was completely unexpected.

Unexpected.

I began to troubleshoot this issue out and part of that process was running the Startup Repair routine. Fortunately Startup Repair logs it's output and while it could be more thorough (I'd love to know not only the result, but the commands it runs - such is findable, I'm certain, but not right there). Among the root causes determined was a line that read "" Fantastic, I thought. Something's screwy with the partition. I decided to try to read the partition. For those of you who know me, you're aware that I love exploiting things like Notepad's open dialog when booting from purposefully restricted live installations of Windows in order to do all sorts of things (perhaps the subject of yet another post). Without going into the details (yet) I was able to browse around and found what appeared to be a relatively complete installation on a very visible and browseable volume on the hard drive. So what was wrong?

I decided I should have a look at the drive with the disk management utility and figured it wouldn't hurt to just use the same copy that I had used to create the partition, so I booted from the WHS Client Restore disc once again and went through the wizard until I was allowed to launch the disk management MMC snap-in. I discovered that I was able to right-click the volume and set it as active from there. This was what led to the eventual repair of the startup function. I restarted the machine, again got the "BOOTMGR is missing" error, which, by now was expected, but I was then able to use the Windows 7 Startup Repair (two passes) to successfully restore the bootsector and Boot Manager that was so desperately needed to get on with this repair.

Now, you're thinking "well, Startup Repair probably couldn't have repaired that problem, afterall, you had to use disk management, that's not available from where you were!"

Actually, it could have. And I could have, via the command prompt, available from the Windows 7 Recovery Options window using diskpart. What was accomplished with Disk Management would have also been doable with diskpart (I went through the motions, it's capable and available). Here's what it should do (and you can do, should you run into this yourself and don't want to do all the ridiculous booting back and forth between various other media):

Boot the system using the Windows Vista or Windows 7 install media, I don't believe this procedure even requires that you use media of a proper version or bittedness since you're just working on the hard disk here, but you'll need the right one to carry on with other startup repairs so make sure you have that.

Click "Repair My Computer" on the second page of the wizard that starts from this disc (after selecting language and keyboard layout).

Cancel any windows offering to restart the machine or telling you they cannot find a windows installation, you want to maneuver to the Advanced Recovery Options window (pictured at the top of this post).

Click Command Prompt at the bottom of the list.

Now you'll use diskpart to adjust your partition's attributes, type the following minus the parenthetical remarks and inserting appropriate values where I have indicated:

diskpart
list disk (you will see a list of fixed disk drives on your system)
select disk X (where X is the number of your hard disk, usually 0)
list partition (you will see all the partitions listed that the selected hard disk contains)
select partition Y (where Y is the number of the partition where Windows lives)
active
exit

Click the X at the top of the Recovery Options window and restart the PC, again, from the appropriate media and re-run Startup Repair. If you observe the logs different root causes and attempted repairs should be indicated. You may have to run through this last step multiple times (mine took two) but in the end you should be able to nudge Startup Repair through the required steps to make your system bootable again. And then, if you're like me, you can move on to the more pressing repair of the day...