An ESXi server failed this morning. As there’s a couple critical services on this piece of hardware, the power in the new data center isn’t up to where we want it yet, and the radio said it was snowing near the office, I drove in expecting to find some unspeakable power situation. The power was fine, but the ESXi server was sitting at a panic screen. Power cycle the machine. It comes up, but none of the VMs start. The vSphere client won’t connect. The server Web page is blank.
Fortunately, tech support mode works. Hit alt-F1, type unsupported, and enter the root password when asked. Whenever I tried to connect to the server with vSphere, my “tail -f /var/log/messages” said something like:
Nov 4 23:35:09 Hostd: [2010-11-04 23:35:09.117 25233B90 warning 'Proxysvc Req00011'] Error reading from client while waiting for header: N7Vmacore15SystemExceptionE(Connection reset by peer)
This is not good. No, not good at all. I wanted to spend the day converting a machine from OpenSolaris to FreeBSD and installing my router for my new bandwidth. Instead Fate has decreed today Wedgie Day.
Mailing list archives and forum posts showed that many people have had this problem. Lots of the forums end with “did anyone ever solve this?” A few people reinstalled ESXi to solve the problem. A couple folks claimed it was a DNS issue.
Our DNS setup hadn’t changed, but I followed the advice and made the following changes.
- In /etc/hosts, remove the real address for the machine and replace it with 127.0.0.1
- Remove all DNS servers from /etc/resolv.conf
I rebooted. The machine came up, and the VMs started. Everything seems fine, but we’ll have to see what happens later.
I have no idea why this worked. Three cheers for “occult IT”! Sigh.