Saturday, May 12, 2007

The upgrade that didn't quite go according to plan

Today (well, yesterday now), I replaced the machine that had been hosting dip.sun.ac.za with a much faster and shinier bit of hardware. To keep the upgrade simple, I had planned to move the old hard disks across and go from there. Since it's very new and shiny hardware, I spent some time ensuring dip had a new enough kernel and all that, so everything should have been fairly smooth.

I wish.

One of the hard drives failed when moved across. No problem, says I, I have a spare, and it's a raid array, so plug in the spare, let the array rebuild and all will be OK. The spare was the right size for the array, only the failed drive had hosted raid space + some extra reserved for swap and a couple of dirs, like /tmp that didn't need to be on the raid array. Aha, says I, I have another extra drive handy, so plug that in, make swap space and the file-systems, and everything is hunky dory. That done, I sat a twiddled my thumbs for a couple of hours while the RAID array rebuild (I did go and have supper while doing this). Once the array had rebuild, and after I'd fixed udev's naming oif the network devices, I rebooted - and the system did not come up. With very little info as to why. Eventually, I booted up using a rescue disk, to discover the problem was because the drive I'd shoved in to provide the swap space had some old lvm metadata left on it, that was stopping the lvm partitions being found correctly. Once this got sufficiently nuked, the machine was happy. I was less so, since by this stage the upgrade had eaten up a lot more time than I wanted.

Still, new shiny machine is running, and seems to be running very fast, so all looks to be well at the moment. I also took the opportunity to upgrade dip to etch, since I was breaking stuff anyway. Other than a brief hiccup with Openldap, it went often without a hitch.

No comments: