Another “write it down so I don’t forget what I did” post.
Some of the systems I’m responsible for are file storage machines, running rsync 3.0 or 3.1 as a daemon. Every hour, an ancient Solaris machine sends files to it using rsync 2.3.1. The billing team uses these files to create bills.
Thursday, I rebooted the machine. And the rsync stopped working with:
rsyncd: rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsyncd: rsync error: error in rsync protocol data stream (code 12) at io.c(226) [Receiver=3.1.0]
The rsyncd server hadn’t changed. No security patches, no package updates, no nothing.
We cannot change the software on the Solaris machine. It’s attached to a multimillion-dollar telco switch, and editing the software on it would invalidate the warranty. The whole point of buying a multimillion-dollar telco switch is so you get the warranty. If something goes wrong, a team of vendor experts descends on your facility with enough spare parts to rebuild the switch from the ground up. (Telephony is nothing like IT. Really.) I cannot use SSH to transfer the files. I do not administer this machine–actually, I don’t want to administer this machine. I’m so unfamiliar with warranties on operating systems that I would probably void it by copying my SSH public key to it or something.
The Solaris box is running rsync 2.3.1, which runs rsync protocol version 20. My systems use newer rsync, running protocol version 30 or 31.
Rsyncd isn’t easily debuggable. Packet analysis showed messages about protocol errors. The rsync FAQ has a whole bunch of troubleshooting suggestions. None of them worked. I ran rsync under truss and strace and painstakingly read system calls. I eventually sacrificed a small helpless creature in accordance with ancient forbidden rites under last weekend’s full moon.
After a few days of running through a backup system (an old but not quite ancient OpenSolaris box), I absolutely had to get this working. So: protocol errors? Let’s try an older rsync.
Rsync 2.9? Same problem. I saw myself progressively working my way through building older versions, solving weird problems one by one, and eventually finding something old enough to work. This is not how I wanted to spend my week. Given how well running FreeBSD 4 in a FreeBSD 10 jail works, I tried something similar.
The host ftp-archive.freebsd.org host releases of every FreeBSD version, including packages. FreeBSD 10 includes compatibility with FreeBSD back to version 4. I installed the compatibility libraries from /usr/ports/misc/compat4.
The oldest FreeBSD 4 rsync package I could find was 2.4.6, from FreeBSD 4.1.1. Original FreeBSD packages were just zipped tar files. I extracted the files and checked that the binary could find all its libraries.
# ldd rsync
libc.so.4 => /usr/local/lib32/compat/libc.so.4 (0x2808a000)
If this was more complicated software, with more libraries, I’d have to track down the missing ones. Rsync is very straightforward, however.
I shut down the old rsync daemon and fired up the old one.
I still want to know how a reboot broke this. I’m assuming that something changed and that I lack the sysadmin chops to identify it. It’s not the rsync binary, or libc; both have date stamps several months old.
I don’t recommend this, as older rsync has all kinds of security problems. These particular hosts are behind several layers of firewalls. If an intruder gets this far, I’m basically doomed anyway.
So: if you’re very very stuck, and the clock has run out, using really old software is an option. But it still makes my skin crawl.