Wednesday, March 9, 2011

Server Woes


The question I asked myself when I set up this external blog here at Blogger was "why not just make one on the server I host at home?". After all, I run WordPress there, and I am much happier with the way WordPress works as opposed to Blogger, I like the flexibility it brings. I like the fact that there are thousands of useful widgets for WordPress as opposed to, oh, about eight, useful widgets for Blogger, the rest being "see pictures of XYZ" where XYZ is a random celebrity, etc. Perhaps today was an indicator as to why I didn't just do that.

I woke up this morning early and after spending a couple hours on work projects my wife called me from the other room "I don't think my E-mail is working.". I checked, mine wasn't either, my personal E-mail anyway. I checked our various websites, concurrently with my wife and we both came to the same result - websites were down too. That means one thing really, server's down.

The server goes down more than it should, I believe. It's not helped by the fact that it's not really connected to clean mains power, I don't even have a UPS supporting it presently. It's also not helped by the fact that it's ancient, and not really server-class hardware. Here's a stock photo:
Holy crap, one of THOSE THINGS?

Yep. One of those things. It's a Mirrored Drive Door G4 Power Macintosh Tower. Affectionately referred to as an "MDD G4" by some, or "problematic concoction of steel and acrylic" by others.

I'm falling into the group of "others".

This morning, I was pleasantly surprised to find that the machine was actually running. I powered the monitor that I had left connected to it since I set it up a few weeks ago (yes, just a few weeks ago, this iteration is young compared to others I have put in place). I logged in and at the onset, functionality was aparent.

Functionality was nothing but actual, however.

I soon noticed some oddities. The Server Admin app, that launches each time the account is logged on complained "There is no server at the address specified.". "Localhost" I thought to myself, "there is a server at localhost".

I attempted to browse the web. "You are not connected to the internet" was Safari's unhelpful reply. Perhaps it wasn't too unhelpful. I examined the ethernet cord, it was firmly plugged in. I gave it a tug for good measure, and then gave it a full re-seat for better measure and tried reloading the page. No go.

I launched the system preferences app to see if I could get at the network configuration from there. System preferences was responsive, until I clicked Networking. Then, SBBOD.

I should mention that this was at about 9 in the morning. I was still in my underwear, I was cold, I was standing on concrete with bare feet and troubleshooting was NOT on my agenda for the day. I had deduced that the night prior, when my dishwasher tripped the same breaker that accommodates the circuit that this server is powered through, more than just a temporary power cut had occurred. Something terrible and hardware related and probably costly to fix.

Remember, I'm not all together by this time in the morning really.

I checked the router - no link light. It's looking more like hardware. I switched the cable to another port on the router, then to the switch I have next to the router, no light, no matter where it's plugged in. "Could it be I fried the lan port in the server?" I asked myself. I decided that this was probably the case and I went to my plan A.

Swap the tower!

I keep a spare unit, identical to this one, but it's a slightly different revision, for just this kind of situation. It has no ram or hard drives, so I swap in my installed server drives and the 2GB of DDR ram that this unit likes and any hardware failure, related to the bulk of the hardware - the power supply, board, cpu, etc. is instantly ruled out.

Problem is, it still didn't work. I swapped the shell, the psu, the board with the ethernet chipset along for the ride, and I got the same issue. Well, now it's certainly a software problem (I'm starting to get it together by now). I'm not interested, having not showered or dressed, to start banging away at a terminal trying to find out why the core processes that support networking weren't running because I had a plan B.

Time machine go!

I had recalled that I had configured Time Machine to backup the server installation to a partition on the first boot drive (don't worry, that's only a small part of my overall backup strategy, you'll see) so I popped in the Server Install disc and went to the Utilities menu...

Wait, did I say I "popped in the Server Install disc"? I'm sorry, it's just not that simple, at least on this MDD unit. This unit, and the spare that I keep, both have flaky optical drives. They're not the easiest thing to get at so I haven't replaced them yet, but if I did, I am not sure what I'd replace them with anyway. See, I have about a half dozen IDE optical drives laying around from various computers I've owned in the past. I have been through each of those drives and while they read discs on PCs just fine, they probably read on Macs too, some of them burn with relative success, none of the spare drives that I have on hand will actually boot a Mac from an install disc. Most will boot PCs, not sure why, but none will boot, at least these MDD G4s (which might have something to do with it). I do, however, have one drive, but I don't call it a spare because it's not. It's the drive from my desktop G5. It's IDE and it, out of all the drives in the house, actually will boot these two MDD units from install media. So, when I say "popped" really mean, I dug the G5 out from under my desk, I pulled the side panel off and fished out the drive from it's latch system, unplugged it, dug out an IDE cable, laid open the MDD unit and connected the G5 optical drive in a sidecar configuration, I THEN popped the server install disc in and booted the machine.

Once inside setup I went to the Utilities menu - I wasn't ready to conduct a reload, not yet anyway and it occurred to me, briefly, to check the drive and repair permissions, so I made a brief stop at Disk Utility. Massive permissions repairs, not unusual it didn't seem. I don't run these exercises on any sort of schedule and these systems are always screwing up their permissions, not a big deal. I did a reboot to see if that fixed anything - not this problem anyway, so I was back to the installer...

By the way, it's about 11:30 now, all this digging and booting optical media takes time.

This time, Time Machine was my destination. This for sure would rescue me! I chose the most recent backup and configured it to restore to the Server HD volume. An hour later and...

Still isn't working.

And back to Time Machine, this time, the OLDEST backup available! Damn the recent configuration changes, I can make them again! Another hour and...

It's STILL not working. This is why I say, "Always have a Plan C:"...

Attack it with a clone!

Remember when I said this build was only a couple weeks old? That's because prior to this build I had a fairly long running installation, I'd say about two years old, that died because of a power failure and/or a bad hard drive. I suspect the two were related, perhaps not. Who knows. When I replaced the hard drive in this server, I bought two drives and cloned my initial, mostly configured state, to the second drive and put it away. I thought with time machine and the other backup strategies I was employing I would probably be able to ride about anything. In that probably is where I found the reason to get a second drive.

Now, rapidly approaching 2:40 PM I swapped in my spare and I was off and running. I had a few configuration changes to make, and I did so. I had, since cloning this drive, placed the web server root folder and the mail queue and database on a separate higher capacity hard drive that wasn't effected in all of this. I changed the configuration to point to those locations, executed a very handy little mailbox repair tool I had found before when I was having some problems and I would say I was back up and running within 15 minutes of calling plan C into action.

Always have a Plan C.

No comments:

Post a Comment