a survey of FreeBSD ZFS snapshot automation tools

Why automatically snapshot filesystems? Because snapshots let you magically fall back to older versions of files and even the operating system. Taking a manual snapshot before a system upgrade is laudable, but you need to easily recover files when everything goes bad. So I surveyed my Twitter followers to see what FreeBSD ZFS snapshot automation tools they use.

The tools:

  • A few people use custom shell scripts of varying reliability and flexibility. I’m not going to write my own shell script. The people who write canned snapshot rotation tools have solved this problem, and I have no desire to re-solve it myself.
  • One popular choice was sysutils/zfs-snapshot-mgmt. This lets you create snapshots as often as once per minute, and retain them as long as you desire. Once a minute is a bit much for me. You can group snapshot creation and deletion pretty much arbitrarily, letting you keep, say, 867 per-minute snapshots, 22 every-seven-minute snapshots, and 13 monthlies, if that’s what you need. This is the Swiss army knife of ZFS snapshot tools. One possible complication with zfs-snapshot-mgmt is that it is written in Ruby and configured in YAML. If you haven’t seen YAML yet, you will–it’s an increasingly popular configuration syntax. My existing automation is all in shell and Perl, however. I added Python for Ansible. Adding yet another interpreter to all of my ZFS systems doesn’t thrill me. Ruby is not a show-stopper, but it doesn’t thrill me. The FreeBSD port is outdated, however–the web site referenced by the port says that the newest code, with bug fixes, is on github. If you’re looking for a FreeBSD porting project, this would be an easy one.
  • The zfs-periodic web page is down. NEC Energy Solutions owns the domain, so I’m guessing that the big corporate overlord claimed the blog and the site isn’t not coming back. The code still lives at various mirrors, however. zfs-periodic is tightly integrated with FreeBSD’s periodic system, and can automatically create and delete hourly, daily, monthly, and weekly snapshots. It appears to be the least flexible of the snapshot systems, as it runs with periodic. If you want to take your snapshots at a time that periodic doesn’t run, too bad. I don’t get a very good feeling from zfs-periodic–if the code had an owner, it would have a web site somewhere.
  • sysutils/zfsnap can do hourly, daily, weekly, and monthly snapshots. It’s designed to run from periodic(8) or cron(8), and is written in /bin/sh.
  • sysutils/zfstools includes a clone of OpenSolaris’ automatic snapshotting tools. I no longer run OpenSolaris-based systems, except on legacy servers that I’m slowly removing, but I never know what the future holds around the dayjob. (I’m waiting for the mission-critical Xenix deployment, I’m sure it’s not far off.) This looks highly flexible, being configured by a combination of cron scripts and ZFS attributes, and can snapshot every 15 minutes, hour, day, week, and month. It’s written in Ruby (yet another scripting language on my system? Oh, joy. Joy and rapture.) On the plus side, the author of zfstools is also a FreeBSD committer, so I can expect him to keep the port up to date.

    In doing this survey I also came across sysutils/freebsd-snapshot, a tool for automatically scheduling and automounting UFS snapshots. While I’m not interested in UFS snapshots right now, this is certainly worth remembering.

    My choice?

    So, which ones will I try? I want a tool that’s still supported and has some flexibility. I want a FreeBSD-provided package of the current version of the software. I’m biased against adding another scripting language to my systems, but that’s not veto-worthy.

    If I want compatibility with OpenSolaris, I’ll use zfstools. I get another scripting language, yay!

    If I don’t care about OpenSolaris-derived systems, zfsnap is the apparent winner.

    Of course, I won’t know which is better until I try both… which will be the topic of a couple more blogs.

    UPDATE, 07-31-2014: I screwed up my research on zfsnap. I have rewritten that part of the article, and my conclusions. My apologies — that’s what happens when you try to do research after four hours sleep. Thanks to Erwin Lansing for pointing it out.

    (“Gee, I’m exhausted. Better not touch any systems today. What shall I do? I know, research and a blog post!” Sheesh.)

  • Installing and Using Tarsnap for Fun and Profit

    Well, “profit” is a strong word. Maybe “not losing money” would be a better description. Perhaps even “not screwing over readers.”

    I back up my personal stuff with a combination of snapshots, tarballs, rsync, and sneakernet. This is fine for my email and my personal web site. Chances are, if all four of my backup sites are simultaneously destroyed, I won’t care.

    A couple years ago I opened my own ecommerce site, so I could sell my self-published books directly to readers. For the record, I didn’t expect Tilted Windmill Press direct sales to actually, y’know, go anywhere. I didn’t expect that people would buy books directly from me when they could just go to their favorite ebookstore, hit a button, and have the book miraculously appear on their ereader.

    I was wrong. People buy books from me. Once every month or two, someone even throws a few bucks in the tip jar, or flat-out overpays for their books. I am pleasantly surprised.

    So: I was wrong about self-publishing, and now I was wrong about author direct sales. Pessimism is grand, because you’re either correct or you get a pleasant surprise. Thank you all.

    But now I find myself in a position where I actually have commercially valuable data, and I need to back it up. Like a real business. I need offsite backups. I need them automated. And I need to be able to recover them, so that people who have bought my books can continue to download them, in the off chance that the Detroit area is firebombed off the Earth while I’m at BSDCan.

    So it’s time for Tarsnap.

    Why Tarsnap?

  • It works very much like tar, so I don’t have to learn any new command-line arguments. (If you’re not familiar with tar, you need to be.)
  • The terms of service are readable by human beings and more reasonable than other backup services.
  • The code is open and auditable.
  • When Tarsnap’s author screws up, he admits it and handles it correctly.
  • It’s cheap. Any backup priced in picodollars gets my attention.

    I also see the author regularly at regular BSD conferences. I can slap him in person if he does anything truly daft.

    Tarsnap has a quick Getting Started page. We’ll do the easy things first. Sign up for a Tarsnap account. Once your account is active, put some money in it–$5 will suffice.

    Now let’s check your prerequisites. You need:

  • GnuPG

    BSD systems come with everything else you need.

    Linux users must install

  • a compiler, like gcc or clang
  • make
  • OpenSSL (including header files)
  • zlib (including header files)
  • System header files
  • OpenSSL header files
  • The ext2fs/ext2_fs.h (not the linux/ext2_fs.h header)

    The Tarsnap download page lists specific packages for Debian-based and Red Hat-based Linuxes.

    Go to the download page and get both the source code and the signed hash file. Tarsnap is only available as source code, so that you can verify the code integrity yourself. So let’s do that.

    Start by using GnuPG to verify the integrity of the Tarsnap code. If you’re not familiar with GnuPG and OpenPGP, some daftie wrote a whole book on PGP & GPG. Once you install GnuPG, run the gpg command to get the configuration files.

    # gpg
    gpg: directory `/home/mwlucas/.gnupg' created
    gpg: new configuration file `/home/mwlucas/.gnupg/gpg.conf' created
    gpg: WARNING: options in `/home/mwlucas/.gnupg/gpg.conf' are not yet active during this run
    gpg: keyring `/home/mwlucas/.gnupg/secring.gpg' created
    gpg: keyring `/home/mwlucas/.gnupg/pubring.gpg' created
    gpg: Go ahead and type your message ...
    ^C
    gpg: signal Interrupt caught ... exiting

    Hit ^C. I just wanted the configuration and key files.

    Now edit $HOME/.gnupg/gpg.conf. Set the following options.

    keyserver hkp://keys.gnupg.net
    keyserver-options auto-key-retrieve

    See if our GPG client can verify the signature file tarsnap-sigs-1.0.35.asc.

    # gpg --decrypt tarsnap-sigs-1.0.35.asc
    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a
    gpg: Signature made Sun Feb 16 23:20:35 2014 EST using RSA key ID E5979DF7
    gpg: Good signature from "Tarsnap source code signing key (Colin Percival) " [unknown]
    gpg: WARNING: This key is not certified with a trusted signature!
    gpg: There is no indication that the signature belongs to the owner.
    Primary key fingerprint: 634B 377B 46EB 990B 58FF EB5A C8BF 43BA E597 9DF7

    Some interesting things here. The most important line here is the statement ‘Good signature from “Tarsnap source code signing key”.’ Your GPG program grabbed the source code signing key from a public key server and used it to verify that the signature file is not tampered with.

    As you’re new to OpenPGP, this is all you can do. You’re not attached to the Web of Trust, so you can’t verify the signature chain. (I do recommend that you get an OpenPGP key and collect a few signatures, so you can verify code signatures if nothing else.)

    Now that we know the signature file is good, we can use the cryptographic hash in the file to validate that the tarsnap code we downloaded is what the Tarsnap author intended. Near the top of the signature file you’ll see the line:

    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a

    Use the sha256(1) program (or sha256sum, or shasum -a 256, or whatever your particular Unix calls the SHA-256 checksum generator) to verify the source code’s integrity.

    # sha256 tarsnap-autoconf-1.0.35.tgz
    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a

    The checksum in the signature file and the checksum you compute match. You have valid source code, and can proceed.

    Extract the source code.

    # tar -xf tarsnap-autoconf-1.0.35.tgz
    # cd tarsnap-autoconf-1.0.35
    # ./configure
    ...
    configure: creating ./config.status
    config.status: creating Makefile
    config.status: creating config.h
    config.status: executing depfiles commands
    #

    If the configure script ends any way other than this, you’re on Linux and didn’t install the necessary development packages. The libraries alone won’t suffice, you must have the development versions.

    If configure completed, run

    # make all install clean

    Tarsnap is now ready to use.

    Start by creating a Tarsnap key for this machine and attaching it to your Tarsnap account. Here I create a key for my machine www.

    # tarsnap-keygen –keyfile /root/tarsnap.key –user mwlucas@michaelwlucas.com –machine pestilence
    Enter tarsnap account password:
    #

    I now have a tarsnap key file. /root/tarsnap.key looks like this:

    # START OF TARSNAP KEY FILE
    dGFyc25hcAAAAAAAAAAzY6MEAAAAAAEAALG8Ix2yYMu+TN6Pj7td2EhjYlGCGrRRknJQ8AeY
    uJsctXIEfurQCOQN5eZFLi8HSCCLGHCMRpM40E6Jc6rJExcPLYkVQAJmd6auGKMWTb5j9gOr
    SeCCEsUj3GzcTaDCLsg/O4dYjl6vb/he9bOkX6NbPomygOpBHqcMOUIBm2eyuOvJ1d9R+oVv
    ...

    This machine is now registered and ready to go.

    This key is important. If your machine is destroyed and you need access to your remote backup, you will need this key! Before you proceed, back it up somewhere other than the machine you’re backing up. There’s lots of advice out there on how to back up private keys. Follow it.

    Now let’s store some backups in the cloud. I’m going to play with my /etc/ directory, because it’s less than 3MB. Start by backing up a single directory.

    # tarsnap -c -f wwwetctest etc/
    Directory /usr/local/tarsnap-cache created for "--cachedir /usr/local/tarsnap-cache"
    Total size Compressed size
    All archives 1996713 382896
    (unique data) 1946025 366495
    This archive 1996713 382896
    New data 1946025 366495

    Nothing seems to happen on the local system. Let’s check and be sure that there’s a backup out in the cloud:

    # tarsnap --list-archives
    wwwetctest

    I then went into /etc and did some cleanup, removing files that shouldn’t have ever been there. This stuff grows in /etc on any long-lived system.

    # tarsnap -c -f wwwetctest-20140716-1508 etc/
    Total size Compressed size
    All archives 3986206 765446
    (unique data) 2120798 403833
    This archive 1989493 382550
    New data 174773 37338

    # tarsnap --list-archives
    wwwetctest
    wwwetctest-20140716-1508

    Note that the compressed size of this archive is much smaller than the first one. Tarsnap only stored the diffs between the two backups.

    If you want more detail about your listed backups, add -v to see the creation date. Add a second -v to see the command used to create the archive.

    # tarsnap --list-archives -vv
    wwwetctest 2014-07-16 15:02:41 tarsnap -c -f wwwetctest etc/
    wwwetctest-20140716-1508 2014-07-16 15:09:38 tarsnap -c -f wwwetctest-20140716-1508 etc/

    Let’s pretend that I need a copy of my backup. Here I extract the newest backup into /tmp/etc.

    # cd /tmp
    # tarsnap -x -f wwwetctest-20140716-1508

    Just for my own amusement, I’ll extract the older backup as well and compare the contents.

    # cd /tmp
    # tarsnap -x -f wwwetctest

    The files I removed during my cleanup are now present.

    What about rotating backups? I now have two backups. The second one is a differential backup against the first. If I blow away the first backup, what happens to the older backup?

    # tarsnap -d -f wwwetctest
    Total size Compressed size
    All archives 1989493 382550
    (unique data) 1938805 366149
    This archive 1996713 382896
    Deleted data 181993 37684

    It doesn’t look like it deleted very much data. And indeed, a check of archive shows that all my files are there.

    And now, the hard part: what do I need to back up? That’s a whole separate class of problem…

  • LibreSSL at BSDCan

    Thanks to various airline problems, we had an open spot on the BSDCan schedule. Bob Beck filled in at the last moment with a talk on the first thirty days of LibreSSL. Here are some rough notes on Bob’s talk (slides now available).

    LibreSSL forked from OpenSSL 1.0.1g.

    Why did “we” let OpenSSL happen? Nobody looked. Or nobody admitted that they looked. We all did it. The code was too horrible to look at. This isn’t just an OpenSSL thing, or just an open source thing. It’s not unique in software development, it’s just the high profile one of the moment.

    Heartbleed was not the final straw that caused the LibreSSL fork. The OpenSSL malloc replacement layer was the final straw. Default OpenSSL never frees memory, so tools can’t spot bugs. It uses LIFO recycling, so you can use after free. The debugging malloc sends all memory information to a log. Lots more in Bob’s slides, but this all combined into an exploit mitigation technique countermeasure. Valgrind, Coverity, and OpenBSD’s randomized memory tools don’t catch this.

    Someone discovered all this this four years ago and opened an OpenSSL bug. It’s still sitting there.

    LibreSSL started by ripping out features. VMS support, 16-bit Windows support, all gone.

    LibreSSL goals:

  • Preserve API/ABI compatibility – become a drop-in replacement.
  • Bring more people into working on the codebase, by making the code less horrible
  • Fix bugs and modern coding processes
  • Do portability right

    As an example, how does OpenSSH (not LibreSSL, but another OpenBSD product) do portable?

  • Assume a sane target OS, and code to that standard.
  • Build and maintain code on the above, using modern C
  • Provide portability shims to correctly do things that other OS’s don’t provide, only for those who need it.
    – No ifdef maze
    – No compromise on what the intrinsic functions actually do
    – Standard intrinsics
    – Don’t reimplement libc

    How does OpenSSL do portable?

  • Assume the OS provides nothing, because you mustn’t break support for Visual C 1.52.
  • Spaghetti mess of #ifdef #ifndef horror nested 17 deep
  • Written in OpenSSL C – essentially it’s own dialect – to program to the worst common denominator
  • Implement own layers and force all platforms to use it

    The result? “Chthulhu sits in his house in #define OPENSSL_VMS and dreams”

    Removed scads of debugging malloc and other nasties.

    What upstream packages call and use them? No way to tell. LibreSSL makes some of the very dangerous options no-ops. Turn on memory debugging? Replace malloc wrappers at runtime? These do nothing. The library internally does not use them.

    Some necessary changes that were implemented in massive sweeps:

  • malloc+memset -> calloc
  • malloc (X*Y) -> reallocarray(X, Y)
  • realloc and free handle NULL, so stop testing everywhere

    OpenSSL used EGD for entropy, and faked random data. OpenSSL gathered entropy from the following sources:

  • Your RSA private key is pretty random
  • “string to give random number generator entropy”
  • getpid()
  • gettimeofday()

    In LibreSSL, entropy is the responsibility of the OS. If your OS cannot provide you with entropy, LibreSSL will not fake it.

    LibreSSL is being reformatted into KNF – the OpenBSD code style. OpenSSL uses whatever style seemed right at the moment. The reformatting makes other problems visible, which is the point. More readable code hopefully means more developer involvement.

    The OpenSSL bug tracking RT has been and continues to be a valuable resource.

    OpenSSL exposes just about everything via public header files. Lots of the API should probably not be used outside of the library, but who knows who calls what? OpenBSD is finding out through constant integration testing with their ports tree.

    The LibreSSL team wants to put the API on a diet so that they can remove potentially dangerous stuff. Their guys are being careful in this by testing against the OpenBSD ports tree. Yes, this conflicts with the “drop-in replacement” goal.

    Internally, LibreSSL uses only regular intrinsic functions provided by libc. OpenSSL’s custom APIs remain for now only to maintain compatibility with external apps.

    Surprises LibreSSL guys in OpenSSL:

  • big endian amd64 support
  • Compile options NO_OLD_ASN1 and option NO_ASN1_OLD are not the same
  • You can turn off sockets, but you can’t turn off debugging malloc
  • socklen_t – if your OS doesn’t have socklen_t, it’s either int or size_t. But OpenSSL does horrible contortions to define its own. If the size of socklen_t changes while your program is running, openssl will cope.
  • OpenSSL also copes if /dev/null moves while openssl is running.

    So far:

  • OpenSSL 1.0.1g was a 388,000 line code base
  • As of yesterday, 90,000 lines of C source deleted, about 150,000 lines of files
  • Approximately 500,000 line unidiff from 1.0.1g at this point
  • Many bugs fixed
  • The cleaning continues, but they’ve started adding new features (ciphers)
  • Code has become more readable – portions remain scary

    LibreSSL has added the following cipher suites under acceptable licenses – Brainpool, ChaCha, poly1305, ANSSI FRP256v1, and several new ciphers based on the above.

    FIPS mode is gone. It is very intrusive. In other places governments mandate use of certain ciphers (Cameilla, GOST, etc). As long as they’re not on by default, and are provided as clean implementations under an acceptable license they will include them. They believe it’s better people who must use these use them in a sane library with a sane API than rolling their own.

    If you want to use the forthcoming portable LibreSSL, you need:

  • modern POSIX environment
  • OS must provide random data – readiness and quality are responsibility of OS
  • malloc/free/calloc/realloc (overflow checking)
  • modern C string capabilities (strlcat, strlcpy, asprintf, etc)
  • explicit_bzero, reallocarry, arc4random

    You can’t replace explicit_bzero with bzero, or arc4random with random. LibreSSL wants a portability team that understands how to make it work correctly.

    LibreSSL’s eventual goals:

  • provide better (replacement, reduced) api
  • reduce code base even more
  • split out non-crypto things from libcrypto
  • split libcrypto from libssl

    There’s lots of challenges to this. The biggest is stable funding.

    The OpenBSD Foundation wants to fund several developers to rewrite key pieces of code. They want to sponsor efforts of the portability team, and the ports people track the impact of proposed API changes.

    They will not do this at the expense of OpenSSH or OpenBSD.

    The OpenBSD Foundation has asked the Linux Foundation for support, but the Linux Foundation has not yet committed to supporting the effort. (I previously said that they hadn’t responded to the request, which is different. The LF has received Bob’s email and discussions are ongoing.)

    In Summary:

  • OpenSSL’s code is awful
  • LibreSSL can be done
  • They need support

    If you’re interested in supporting the effort, contact the OpenBSD Foundation. The Foundation is run by Bob Beck and Ken Westerback, and they manage all funding. (While Theo de Raadt leads the OpenBSD Project, he actually has nothing to do with allocating funding.)

  • Penguicon 2014 Schedule

    “Hey, where is Lucas? Why hasn’t he posted lately?”

    I’ve done nothing worth posting about. Most of this month I spent removing a per-millennial switch from the core of the network, which was painstaking and annoying but not noteworthy. I then spent nine days at a writing workshop, which was fascinating, educational, and utterly exhausting. I could argue that the workshop was worth blogging about, but I was too busy writing to waste time writing. If you’re interested in writing, though, and you have a chance to do any of Dean or Kris’ workshops, go.

    So:

    Next weekend, I’ll be at Penguicon, appearing on various panels. You can see me at the following one-hour events.

    Friday

  • 5PM: BSD Operating Systems, a Tour – What it says on the label
  • Saturday

  • 11AM: Sudo – You’re Doing It Wrong – Why your popular sudo configuration is incorrect, and how to do it safely
  • 1PM: Copyright versus Free Information – What happens when the concept of ‘information can’t be contained’ clashes with content creators who want monetary recompense for their hard work? Speakers include:Michael W. Lucas, Shetan Noir, Eva Galperin, Cory Doctorow
  • 6PM: SSH Key Authentication Tutorial – If you’re not doing SSH key authentication, show up here.
  • 8PM: Self-Publishing 101 – Do you? Should you? Various tools and techniques and recommendations.
  • Sunday

  • 2PM: DNSSEC in 50 minutes – How DNSSEC works, and why you should care

    Now if you’ll excuse me, I have a whole great big heap of slides to do…

  • Book Review: “Applied Network Security Monitoring”

    Chris Sanders kindly sent me a review copy of Applied Network Security Monitoring, written by Sanders along with Jason Smith, David J Bianco, and Liam Randall. It’s a very solid work, with much to recommend it to IT people who either have been told to implement security monitoring or who think that they should.

    Some of Applied Network Security Monitoring will be very familiar to anyone who has read any other security book–I’ve read many times that risk equals impact times probability. Every book on this topic needs this information, however, and Sanders and company cover it in sufficient detail to ground a probie while letting the rest of us easily skim it as a refresher.

    Then they take us through selecting data collection points and how they make decisions on where to collect data and what kind of data to collect. Ideally, of course, you collect full packet data everywhere, but in my semi-rural gigabit ISP world I don’t have enough electricity to spin that much disk. Where can you get by with session data, and where do you need full packet capture? ANSM takes you through the choices and the advantages and disadvantages of each, along with some guidance on the hardware needs.

    Data is nice, but it’s what you do with the data that makes security analysis interesting. ANSM uses Security Onion as an underlying toolkit. Security Onion is huge, and contains myriad tools for any given purpose. There’s reasons for this–no one NSM tool is a perfect fit for all environments. ANSM chooses their preferred tools, such as Snort, Bro, and SiLK, and takes you through configuring and using them on the SO platform. Their choices give you honeypots and log management and all the functionality you expect.

    Throughout the book you’ll find business and tactical advice. How do you organize a security team? How do you foster teamwork, retain staff, and deal with arrogant dweebs such as yours truly? (As an aside, ANSM contains the kindest and most business-driven description of the “give the arrogant guy enough rope to hang himself” tactic that I have ever read.) I’ve been working with the business side of IT for decades now, and ANSM taught me new tricks.

    The part of the book that I found most interesting was the section on analysis. What is analysis, anyway? ANSM takes you through both differential analysis and relational analysis, and illustrates them with actual scenarios, actual data. Apparently I’m a big fan of differential diagnosis. I use it everywhere. For every problem. Fortunately, Sanders and crew include guidelines for when to try each type of analysis. I’ll have to try this “relational analysis” thing some time and see what happens.

    Another interesting thing about ANSM is how it draws in lots of knowledge and examples from the medical field. Concepts like morbidity and mortality are very applicable to information technology in general, not just network security monitoring, and adding this makes the book both more useful and more interesting.

    Applied Network Security Monitoring is a solid overview of the state of security analysis in 2014, and was well worth my time to read. It’s worth your time as well.

    postscript

    Not long ago, I reviewed Richard Bejtlich’s The Practice of Network Security Monitoring. What’s more, I have corresponded with both Sanders and Bejtlich, and while they aren’t “help me hide a body” friends I’d happily share a meal with either.

    The obvious question people will ask is, how does Applied NSM compare to tPoNSM?

    Both books use Security Onion. Each book emphasizes different tools, different methodologies, and different techniques. Practical NSM shows Bejtlich’s military background. While Sanders has worked with the military, Applied NSM reads like it’s from an IT background.

    I can’t say either is a better book. Both are very very good.

    Personally, I have never implemented any plan from a book exactly as written. I read books, note their advice, and build a plan that suits my environment, my budget, and–most importantly–my staff. Reading them, I picked between tools and strategies until I found something that would work for my site. Security monitoring is a complex field. Maintaining, let alone building, a security monitoring infrastructure requires constant sharpening of your skills.

    I recommend anyone serious about the field read both books.

    DNSSEC-verified SSL Certificates, the Standard Way

    DANE, or DNS-based Authentication of Named Entities, is a protocol for stuffing public key and or public key signatures into DNS. As standard DNS is forged easily, you can’t safely do this without DNSSEC. With DNSSEC, however, you now have an alternative way to verify public keys. Two obvious candidates for DANE data are SSH host keys and SSL certificate fingerprints. In this post I take you through using DNSSEC-secured DNS to verify web site SSL certificates via DNSSEC (sometimes called DNSSEC-stapled SSL certificates).

    In DNSSEC Mastery I predicted that someone would release a browser plug-in to support validation of DNSSEC-staples SSL certificates. This isn’t a very difficult prediction, as a few different people had already started down that road. One day browsers will support DANE automatically, but until then, we need a plug-in. I’m pleased to report that the fine folks at dnssec-validator.cz have completed their TLSA verification plugin. I’m using it without problems in Firefox, Chrome, and IE.

    DNS provides SSL certificate fingerprints with a TLSA record. (TLSA isn’t an acronym, it’s just a TLS record, type A. Presumably we’ll move on to TLSB at some point.)

    A TLSA record looks like this:

    _port._protocol.hostname TLSA ( 3 0 1 hash...)

    If you’ve worked with services like VOIP, this should look pretty familiar. For example, the TLSA record for port 443 on the host dnssec.michaelwlucas.com looks like this:

    _443._tcp.dnssec TLSA ( 3 0 1 4CB0F4E1136D86A6813EA4164F19D294005EBFC02F10CC400F1776C45A97F16C)

    Where do we get the hash? Run openssl(1) on your certificate file. Here I generate the SHA256 hash of my certificate file, dnssec.mwl.com.crt.

    # openssl x509 -noout -fingerprint -sha256 < dnssec.mwl.com.crt
    SHA256 Fingerprint=4C:B0:F4:E1:13:6D:86:A6:81:3E:A4:16:4F:19:D2:94:00:5E:BF:C0:2F:10:CC:40:0F:17:76:C4:5A:97:F1:6C

    Copy the fingerprint into the TLSA record. Remove the colons.

    Interestingly, you can also use TLSA records to validate CA-signed certificates. Generate the hash the same way, but change the leading string to 1 0 1. I’m using a CA-signed certificate for https://www.michaelwlucas.com, but I also validate it via DNSSEC with a record like this.

    _443._tcp.www TLSA ( 1 0 1 DBB17D0DE507BB4DE09180C6FE12BBEE20B96F2EF764D8A3E28EED45EBCCD6BA )

    So: if you go to the trouble of setting this up, what does the client see?

    Start by installing the DNSSEC/TLSA Validator plugin in your browser. (Peter Wemm has built the Firefox version of the plugin on FreeBSD, and he has a patch and a binary. Use the binary at your own risk, of course, but if you’re looking for a BSD porting project, this would be very useful.)

    The plugin adds two new status icons. One turns green if the site’s DNS uses DNSSEC, and has a small gray-with-a-touch-of-red logo if the site does not. Not having DNSSEC is not cause for alarm. The second icon turns green if the SSL certificate matches a TLSA record, gray if there is no TLSA record, and red if the certificate does not match the TLSA record.

    So: should you worry about that self-signed certificate? Check the TLSA record status. If the domain owner says “Yes, I created this cert,” it’s probably okay. If the self-signed cert fails TLSA validation, don’t go to the site.

    You can use a variety of hashes with TLSA, and you can set a variety of conditions as well. Should all certificates in your company be signed with RapidSSL certs? You can specify that in a TLSA record. Do you have a private CA? Give its fingerprint in a TLSA record. If you want to play with these things, check out my DNSSEC book.

    TLSA gives you an alternate avenue of trust, outside of the traditional and expensive CA model. Spreading TLSA more widely means that you can protect more services with SSL without additional financial expenses.

    NYCBSDCon 2014 Video, and 2014 appearances

    The video of my NYCBSDCon talk is now on available on YouTube.

    This talk is a little rougher than most I give. I felt worn-out before I even spoke on Saturday night. I woke up Sunday morning with tonsils the size of tennis balls (which made airport security interesting, let me tell you. “No, those aren’t bombs, let me fly home dang it!”).

    So, on the day of NYCBSDCon I was obviously sliding down the ramp into illness.

    I don’t script my talks beforehand. Yes, I have bullet points on my slides, but they’re an outline. This leaves me free to shape what I say to fit the audience’s interests and reactions. This also means that if I’m on the verge of falling ill, phrases like “This sucks diseased moose wang” slip into the presentation. It’s not that I object to the term, but it’s stolen from a Harry Dresden novel. I prefer to hand-craft my insults, precisely tailoring each to fit the object of my derision. If you take the trouble to come see me, the least you can expect is originality.

    And speaking of speaking:

    Early in May, I’ll be at Penguicon. There I’ll be speaking and on panels covering BSD, sudo, SSH, DNSSEC, and writing.

    Later in May I’m teaching a four-hour sudo tutorial at BSDCan 2014.

    If you want to see me in 2014, these are your only opportunities short of coming to Detroit and joining my dojo. (That’s an option, of course, but there’s better reasons for practicing martial arts than seeing me. Plus, at the dojo you’ll have to try to throw me. That gets tiring quickly.) I’ll have paper books available at both cons.

    I have no other public appearances planned for 2014. I intend to spend the rest of the year concentrating on home, writing, and martial arts.

    Come on. Hang out. I promise to not use the phrase “diseased moose wang” during any scheduled talk.

    Running Ancient Rsync

    Another “write it down so I don’t forget what I did” post.

    Some of the systems I’m responsible for are file storage machines, running rsync 3.0 or 3.1 as a daemon. Every hour, an ancient Solaris machine sends files to it using rsync 2.3.1. The billing team uses these files to create bills.

    Thursday, I rebooted the machine. And the rsync stopped working with:

    rsyncd[3582]: rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
    rsyncd[3582]: rsync error: error in rsync protocol data stream (code 12) at io.c(226) [Receiver=3.1.0]

    The rsyncd server hadn’t changed. No security patches, no package updates, no nothing.

    We cannot change the software on the Solaris machine. It’s attached to a multimillion-dollar telco switch, and editing the software on it would invalidate the warranty. The whole point of buying a multimillion-dollar telco switch is so you get the warranty. If something goes wrong, a team of vendor experts descends on your facility with enough spare parts to rebuild the switch from the ground up. (Telephony is nothing like IT. Really.) I cannot use SSH to transfer the files. I do not administer this machine–actually, I don’t want to administer this machine. I’m so unfamiliar with warranties on operating systems that I would probably void it by copying my SSH public key to it or something.

    The Solaris box is running rsync 2.3.1, which runs rsync protocol version 20. My systems use newer rsync, running protocol version 30 or 31.

    Rsyncd isn’t easily debuggable. Packet analysis showed messages about protocol errors. The rsync FAQ has a whole bunch of troubleshooting suggestions. None of them worked. I ran rsync under truss and strace and painstakingly read system calls. I eventually sacrificed a small helpless creature in accordance with ancient forbidden rites under last weekend’s full moon.

    After a few days of running through a backup system (an old but not quite ancient OpenSolaris box), I absolutely had to get this working. So: protocol errors? Let’s try an older rsync.

    Rsync 2.9? Same problem. I saw myself progressively working my way through building older versions, solving weird problems one by one, and eventually finding something old enough to work. This is not how I wanted to spend my week. Given how well running FreeBSD 4 in a FreeBSD 10 jail works, I tried something similar.

    The host ftp-archive.freebsd.org host releases of every FreeBSD version, including packages. FreeBSD 10 includes compatibility with FreeBSD back to version 4. I installed the compatibility libraries from /usr/ports/misc/compat4.

    The oldest FreeBSD 4 rsync package I could find was 2.4.6, from FreeBSD 4.1.1. Original FreeBSD packages were just zipped tar files. I extracted the files and checked that the binary could find all its libraries.

    # ldd rsync
    rsync:
    libc.so.4 => /usr/local/lib32/compat/libc.so.4 (0x2808a000)

    If this was more complicated software, with more libraries, I’d have to track down the missing ones. Rsync is very straightforward, however.

    I shut down the old rsync daemon and fired up the old one.

    It worked.

    I still want to know how a reboot broke this. I’m assuming that something changed and that I lack the sysadmin chops to identify it. It’s not the rsync binary, or libc; both have date stamps several months old.

    I don’t recommend this, as older rsync has all kinds of security problems. These particular hosts are behind several layers of firewalls. If an intruder gets this far, I’m basically doomed anyway.

    So: if you’re very very stuck, and the clock has run out, using really old software is an option. But it still makes my skin crawl.

    ifup-local on bridge members on CentOS

    I run a bunch of CentOS 6 physical servers as QEMU virtualization devices. These hosts have two NICs, one for management and one for virtual machine bridges.

    When you use Linux for virtualization, it’s important to increase the amount of memory for network transmit and receive buffers. You also need to disable GSO and TSO, to improve performance and to avoid gigabytes of kernel error messages every day. You can do this with ethtool(8). First, let’s check the existing ring sizes.

    # ethtool -g eth0
    Ring parameters for eth0:
    Pre-set maximums:
    RX: 16384
    RX Mini: 0
    RX Jumbo: 0
    TX: 16384
    Current hardware settings:
    RX: 512
    RX Mini: 0
    RX Jumbo: 0
    TX: 512

    Similarly, use ethtool -k eth0 to check GSO and TSO settings.

    The card is using much less memory than it can. When you have a bunch of virtual machines pouring data through the card, you want the card to work as efficiently as possible. Fixing this on a running system is easy enough:

    # ethtool -G eth0 tx 16384 rx 16384
    # ethtool -K eth0 gso off tso off

    Repeat the process for eth1.

    How do you make this happen automatically at boot? Adding the commands to /etc/rc.local isn’t reliable. By the time the system gets that much stuff running, the ethtool command might fail with a “Cannot allocate memory” error. If you try again it’ll probably work, but it’s not deterministic. And I’m against running a single command four times in rc.local in the hopes that one of them will work.

    Enter /sbin/ifup-local. CentOS runs this script after bringing up an interface, with the interface name as an argument. The problem is, it doesn’t run this script on bridge member interfaces. We can adjust eth0 and br0 at boot just fine, but eth1 (the physical interface underlying br0) doesn’t get run.

    You can’t run ethtool -G eth0 tx 16384 rx 16384 on br0. Interface br0 doesn’t have any transmit or receive rings. It’s a logical interface. You can disable TSO and GSO on br0, but that won’t disable it on eth1. You can’t wait to reconfigure eth1 in rc.local until the system is running, because increasing the memory doesn’t always work once the system is running full-out multiuser. And Red Hat says this is by design. Apparently network bridges on CentOS/Red Hat are supposed to perform poorly. That’s good to know.

    So, what to do?

    I adjust the eth1 ring size in ifup-local when bringing up br0, but before any processes send any traffic over the bridge. My /sbin/ifup-local looks like this:

    #!/bin/bash

    case "$1" in
    eth0)
    echo "Configuring eth0..."
    /sbin/ethtool -G eth0 tx 16384 rx 16384
    /sbin/ethtool -K eth0 gso off tso off
    ;;

    br0)
    echo "Configuring br0..."
    /sbin/ethtool -G eth1 tx 16384 rx 16384
    /sbin/ethtool -K eth1 gso off tso off
    /sbin/ethtool -K br0 gso off tso off
    ;;

    esac
    exit 0

    This appears to work consistently. Of course, the values for the NIC need to be set on a per-machine basis. I have Ansible do that work for me.

    Hopefully, this will save someone else the pain I’ve been through trying to make this work…

    Jan 2014 Java update broke me

    So I’m trying to upgrade my Ansible server to the newest OpenBSD snapshot, which involves working at the console. I go to my virtual server control panel, click on the link to the Java applet, and get told that Java won’t run this application.

    Turns out that Java has trusted self-signed certificates for applications until now, relying on blacklists rather than whitelists. I simultaneously applaud this move away from enumerating badness and condemn them for temporarily inconveniencing me.

    To whitelist a specific site, open the Java configuration applet. For Windows users, this is the Java Control Panel. Open the Security tab. About 2/3 of the way down, there’s an “Edit site list” option. Add the desired web site.

    Java will then run applets from that web site.