New autobiography chapter: The Thumbs

Lots of people are sad today, it seemed a good time to put this up. And as I got yelled at by people the last time I didn’t mention this on the blog:

There’s a new autobiography chapter up.

Why am I writing an autobiography? I’m not. But these are stories I tell over and over again, so I put them up in a central place.

FreeBSD ZFS snapshots with zfstools

In my recent survey of ZFS snapshot automation tools, I short-listed zfstools and zfsnap. I’ll try both, but first we’ll cover FreeBSD ZFS snapshots with zfstools. Zfstools includes a script for creating snapshots, another for removing old snapshots, and one for snapshotting MySQL databases. The configuration uses only ZFS attributes and command line arguments via cron.

Start by deciding which ZFS filesystems you want to snapshot. The purpose of snapshotting is to let you get older versions of a filesystem, so you can either roll back the entire filesystem or grab an older version of a file. I’m a fan of partitioning, and typically use many partitions on a ZFS system. I use separate ZFS for everything from /usr/ports/packages to /var/empty.

So, which of these partitions won’t need snapshots? I don’t snapshot the following, either because I don’t care about older versions or because the contents are easily replicable. (I would snapshot some of these on, say, my package-building machine.)

/tmp
/usr/obj
/usr/src
/usr/ports
/usr/ports/distfiles
/usr/ports/packages
/var/crash
/var/empty
/var/run
/var/tmp

Zfstools uses the ZFS property com.sun:auto-snapshot to determine if it should snapshot a filesystem. On a new system, this property should be totally unset, like so:


# zfs get com.sun:auto-snapshot
...
zroot/var/tmp com.sun:auto-snapshot - -
...

Set this property to false for datasets you want zfstools to never snapshot.

# zfs set com.sun:auto-snapshot=false zroot/var/empty

You should now see the attribute set:


# zfs get com.sun:auto-snapshot zroot/var/empty
NAME PROPERTY VALUE SOURCE
zroot/var/empty com.sun:auto-snapshot false local

Set this attribute for every filesystem you don’t want to snapshot, then activate snapshotting on the entire zpool.

# zfs set com.sun:auto-snapshot=true zroot

The other filesystems will inherit this property from their parent zpool.

Now to activate snapshots. Zfstool’s zfs-auto-snapshot tool expects to run out of cron. It requires two arguments: the name of the snapshot, and how many of that snapshot to keep. So, to create a snapshot named “15min” and retain 4 of them, you would run

# zfs-auto-snapshot 15min 4

Zfstools lets you name your snapshot anything you want. You can call your hourly snapshots LucasIsADweeb if you like. Other snapshot tools are not so flexible.

The sample cron file included suggests retaining 4 15 minute snapshots, 24 hourly snapshots, 7 daily snapshots, 4 weekly snapshots, and 12 monthly snapshots. That’s a decent place to start, so give root the following cron entries:

PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin
15,30,45 * * * * /usr/local/sbin/zfs-auto-snapshot 15min     4
0        * * * * /usr/local/sbin/zfs-auto-snapshot hourly   24
7        0 * * * /usr/local/sbin/zfs-auto-snapshot daily     7
14       0 * * 7 /usr/local/sbin/zfs-auto-snapshot weekly    4
28       0 1 * * /usr/local/sbin/zfs-auto-snapshot monthly  12

(The zfstools instructions call the 15-minute snapshots “frequent.” I’m choosing to use a less ambiguous name.)

One important thing to note is that zfstools is written in Ruby, and each script starts with an environment call to find the ruby interpreter. You must set $PATH in your crontab.

Now watch /var/log/cron for error messages. If zfs-auto-snapshot runs correctly, you’ll start to see snapshots like these:

# zfs list -t snapshot

NAME                                                         USED  AVAIL  REFER  MOUNTPOINT
zroot/ROOT/default@zfs-auto-snap_monthly-2014-08-01-00h28       0      -   450M  -
zroot/ROOT/default@zfs-auto-snap_weekly-2014-08-03-00h14        0      -   450M  -
zroot/ROOT/default@zfs-auto-snap_daily-2014-08-06-00h07         0      -   450M  -
zroot/ROOT/default@zfs-auto-snap_hourly-2014-08-06-11h00        0      -   450M  -
zroot/ROOT/default@zfs-auto-snap_15min-2014-08-06-11h30         0      -   450M  -
zroot/home@zfs-auto-snap_hourly-2014-07-31-16h00              84K      -   460K  -
...

Each snapshot is unambiguously named after the snapshot tool, the snapshot name, and the time the snapshot was made.

Snapshots consume an amount of disk space related to the amount of churn on the filesystem. The root filesystem on this machine doesn’t change, so the snapshots are tiny. Filesystems with a lot of churn will generate much larger snapshots. I would normally recommend not snapshotting these filesystems, but you can at least exclude the 15-minute snapshots by setting a property.

# zfs set com.sun:auto-snapshot:15min=false zroot/var/churn

Note that this won’t show up when you do a zfs get com.sun:auto-snapshot. You must specifically check for this exact property. To see all snapshot settings, I would up using zfs get all and grep(1).

# zfs get -t filesystem all | grep auto-snapshot

zroot/ROOT/default  com.sun:auto-snapshot:frequent  false    local
zroot/ROOT/default  com.sun:auto-snapshot           true     inherited from zroot
zroot/ROOT/default  sun.com:auto-snapshot           true     inherited from zroot

If you disable a particular interval’s snapshots on a filesystem with existing snapshots, the older snapshots will gradually rotate away. That is, if you kept 4 15-minute snapshots, when you disable those 15-minute snapshots the old snapshots will disappear over the next hour.

Overall, zfstools works exactly as advertised. It creates new snapshots as scheduled, destroys old snapshots, and has very fine-grained control over what you’ll snapshot and when. The use of ZFS attributes to control which filesystems get snapshotted under which conditions doesn’t thrill me, but I’m biased towards configuration files and reading this configuration is really no worse than a config file.

I’ll try zfsnap next.

a survey of FreeBSD ZFS snapshot automation tools

Why automatically snapshot filesystems? Because snapshots let you magically fall back to older versions of files and even the operating system. Taking a manual snapshot before a system upgrade is laudable, but you need to easily recover files when everything goes bad. So I surveyed my Twitter followers to see what FreeBSD ZFS snapshot automation tools they use.

The tools:

  • A few people use custom shell scripts of varying reliability and flexibility. I’m not going to write my own shell script. The people who write canned snapshot rotation tools have solved this problem, and I have no desire to re-solve it myself.
  • One popular choice was sysutils/zfs-snapshot-mgmt. This lets you create snapshots as often as once per minute, and retain them as long as you desire. Once a minute is a bit much for me. You can group snapshot creation and deletion pretty much arbitrarily, letting you keep, say, 867 per-minute snapshots, 22 every-seven-minute snapshots, and 13 monthlies, if that’s what you need. This is the Swiss army knife of ZFS snapshot tools. One possible complication with zfs-snapshot-mgmt is that it is written in Ruby and configured in YAML. If you haven’t seen YAML yet, you will–it’s an increasingly popular configuration syntax. My existing automation is all in shell and Perl, however. I added Python for Ansible. Adding yet another interpreter to all of my ZFS systems doesn’t thrill me. Ruby is not a show-stopper, but it doesn’t thrill me. The FreeBSD port is outdated, however–the web site referenced by the port says that the newest code, with bug fixes, is on github. If you’re looking for a FreeBSD porting project, this would be an easy one.
  • The zfs-periodic web page is down. NEC Energy Solutions owns the domain, so I’m guessing that the big corporate overlord claimed the blog and the site isn’t not coming back. The code still lives at various mirrors, however. zfs-periodic is tightly integrated with FreeBSD’s periodic system, and can automatically create and delete hourly, daily, monthly, and weekly snapshots. It appears to be the least flexible of the snapshot systems, as it runs with periodic. If you want to take your snapshots at a time that periodic doesn’t run, too bad. I don’t get a very good feeling from zfs-periodic–if the code had an owner, it would have a web site somewhere.
  • sysutils/zfsnap can do hourly, daily, weekly, and monthly snapshots. It’s designed to run from periodic(8) or cron(8), and is written in /bin/sh.
  • sysutils/zfstools includes a clone of OpenSolaris’ automatic snapshotting tools. I no longer run OpenSolaris-based systems, except on legacy servers that I’m slowly removing, but I never know what the future holds around the dayjob. (I’m waiting for the mission-critical Xenix deployment, I’m sure it’s not far off.) This looks highly flexible, being configured by a combination of cron scripts and ZFS attributes, and can snapshot every 15 minutes, hour, day, week, and month. It’s written in Ruby (yet another scripting language on my system? Oh, joy. Joy and rapture.) On the plus side, the author of zfstools is also a FreeBSD committer, so I can expect him to keep the port up to date.

    In doing this survey I also came across sysutils/freebsd-snapshot, a tool for automatically scheduling and automounting UFS snapshots. While I’m not interested in UFS snapshots right now, this is certainly worth remembering.

    My choice?

    So, which ones will I try? I want a tool that’s still supported and has some flexibility. I want a FreeBSD-provided package of the current version of the software. I’m biased against adding another scripting language to my systems, but that’s not veto-worthy.

    If I want compatibility with OpenSolaris, I’ll use zfstools. I get another scripting language, yay!

    If I don’t care about OpenSolaris-derived systems, zfsnap is the apparent winner.

    Of course, I won’t know which is better until I try both… which will be the topic of a couple more blogs.

    UPDATE, 07-31-2014: I screwed up my research on zfsnap. I have rewritten that part of the article, and my conclusions. My apologies — that’s what happens when you try to do research after four hours sleep. Thanks to Erwin Lansing for pointing it out.

    (“Gee, I’m exhausted. Better not touch any systems today. What shall I do? I know, research and a blog post!” Sheesh.)

  • Google Play notes

    A couple months ago, I put my Tilted Windmill Press books up on Google Play. I firmly believe that having your books widely available is a good thing. Google Play let me be DRM-free, and while their discounting system is a pain to work around, I’d like people to be able to get my books easily. I’ve sold six books through Google Play, which isn’t great but hey, it’s six readers I wouldn’t have otherwise.

    Amazon is overwhelmingly my biggest reseller. I get over 90% of my self-publishing income from them. They provide truly impressive analytical tools. While sites like Smashwords provide you with spreadsheets that you can dump into whatever analytics tools you want, Amazon gives you the spreadsheets and a bunch of graphs and charts and other cool stuff.

    This made it really obvious that a day after my books went live on Google Play, my Amazon sales plummeted by about a third and have remained there.

    This is weird. And I really would like my sales back up where they were.

    I can think of lots of explanations, most of them involving computer algorithms. No conspiracy is required here. I’m certain Amazon didn’t de-prioritize my books just because they’re available on Google Play. Book sales fluctuate naturally, and there usually is a dip during the summer. But the graphs (both Amazon’s and my own) makes it really clear that this is an unusual slump.

    As an experiment, I’ve disabled my books in Google Play. People who bought the book will still have access to it, but nobody can purchase it now.

    If my Amazon sales recover, the Google Play store will remain off. The few Play sales don’t make up for the lost Amazon sales.

    I will report back on the results. But, if you’re wondering where my Google Play store went, the answer is: away.

    FreeBSD Mastery: Storage Essentials – discount pre-pub available

    You can now buy my next tech book, FreeBSD Mastery: Storage Essentials, for $7.99.

    This is an incomplete book. It has not been tech reviewed. The advice in it might eat your disks and sell your soul to a Manhattan hot dog vendor for use as a dish cloth. So think of it as a discount pre-order, or your opportunity to correct one of my books before it goes to print.

    I will have a tech review done when the first draft is complete.

    I had not originally planned to do pre-orders, but I’m now comfortable enough with the topic that I think I can do so without totally humiliating myself. Worse than usual, that is.

    And if you were on my mailing list, you would have known this earlier.

    Installing and Using Tarsnap for Fun and Profit

    Well, “profit” is a strong word. Maybe “not losing money” would be a better description. Perhaps even “not screwing over readers.”

    I back up my personal stuff with a combination of snapshots, tarballs, rsync, and sneakernet. This is fine for my email and my personal web site. Chances are, if all four of my backup sites are simultaneously destroyed, I won’t care.

    A couple years ago I opened my own ecommerce site, so I could sell my self-published books directly to readers. For the record, I didn’t expect Tilted Windmill Press direct sales to actually, y’know, go anywhere. I didn’t expect that people would buy books directly from me when they could just go to their favorite ebookstore, hit a button, and have the book miraculously appear on their ereader.

    I was wrong. People buy books from me. Once every month or two, someone even throws a few bucks in the tip jar, or flat-out overpays for their books. I am pleasantly surprised.

    So: I was wrong about self-publishing, and now I was wrong about author direct sales. Pessimism is grand, because you’re either correct or you get a pleasant surprise. Thank you all.

    But now I find myself in a position where I actually have commercially valuable data, and I need to back it up. Like a real business. I need offsite backups. I need them automated. And I need to be able to recover them, so that people who have bought my books can continue to download them, in the off chance that the Detroit area is firebombed off the Earth while I’m at BSDCan.

    So it’s time for Tarsnap.

    Why Tarsnap?

  • It works very much like tar, so I don’t have to learn any new command-line arguments. (If you’re not familiar with tar, you need to be.)
  • The terms of service are readable by human beings and more reasonable than other backup services.
  • The code is open and auditable.
  • When Tarsnap’s author screws up, he admits it and handles it correctly.
  • It’s cheap. Any backup priced in picodollars gets my attention.

    I also see the author regularly at regular BSD conferences. I can slap him in person if he does anything truly daft.

    Tarsnap has a quick Getting Started page. We’ll do the easy things first. Sign up for a Tarsnap account. Once your account is active, put some money in it–$5 will suffice.

    Now let’s check your prerequisites. You need:

  • GnuPG

    BSD systems come with everything else you need.

    Linux users must install

  • a compiler, like gcc or clang
  • make
  • OpenSSL (including header files)
  • zlib (including header files)
  • System header files
  • OpenSSL header files
  • The ext2fs/ext2_fs.h (not the linux/ext2_fs.h header)

    The Tarsnap download page lists specific packages for Debian-based and Red Hat-based Linuxes.

    Go to the download page and get both the source code and the signed hash file. Tarsnap is only available as source code, so that you can verify the code integrity yourself. So let’s do that.

    Start by using GnuPG to verify the integrity of the Tarsnap code. If you’re not familiar with GnuPG and OpenPGP, some daftie wrote a whole book on PGP & GPG. Once you install GnuPG, run the gpg command to get the configuration files.

    # gpg
    gpg: directory `/home/mwlucas/.gnupg' created
    gpg: new configuration file `/home/mwlucas/.gnupg/gpg.conf' created
    gpg: WARNING: options in `/home/mwlucas/.gnupg/gpg.conf' are not yet active during this run
    gpg: keyring `/home/mwlucas/.gnupg/secring.gpg' created
    gpg: keyring `/home/mwlucas/.gnupg/pubring.gpg' created
    gpg: Go ahead and type your message ...
    ^C
    gpg: signal Interrupt caught ... exiting

    Hit ^C. I just wanted the configuration and key files.

    Now edit $HOME/.gnupg/gpg.conf. Set the following options.

    keyserver hkp://keys.gnupg.net
    keyserver-options auto-key-retrieve

    See if our GPG client can verify the signature file tarsnap-sigs-1.0.35.asc.

    # gpg --decrypt tarsnap-sigs-1.0.35.asc
    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a
    gpg: Signature made Sun Feb 16 23:20:35 2014 EST using RSA key ID E5979DF7
    gpg: Good signature from "Tarsnap source code signing key (Colin Percival) " [unknown]
    gpg: WARNING: This key is not certified with a trusted signature!
    gpg: There is no indication that the signature belongs to the owner.
    Primary key fingerprint: 634B 377B 46EB 990B 58FF EB5A C8BF 43BA E597 9DF7

    Some interesting things here. The most important line here is the statement ‘Good signature from “Tarsnap source code signing key”.’ Your GPG program grabbed the source code signing key from a public key server and used it to verify that the signature file is not tampered with.

    As you’re new to OpenPGP, this is all you can do. You’re not attached to the Web of Trust, so you can’t verify the signature chain. (I do recommend that you get an OpenPGP key and collect a few signatures, so you can verify code signatures if nothing else.)

    Now that we know the signature file is good, we can use the cryptographic hash in the file to validate that the tarsnap code we downloaded is what the Tarsnap author intended. Near the top of the signature file you’ll see the line:

    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a

    Use the sha256(1) program (or sha256sum, or shasum -a 256, or whatever your particular Unix calls the SHA-256 checksum generator) to verify the source code’s integrity.

    # sha256 tarsnap-autoconf-1.0.35.tgz
    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a

    The checksum in the signature file and the checksum you compute match. You have valid source code, and can proceed.

    Extract the source code.

    # tar -xf tarsnap-autoconf-1.0.35.tgz
    # cd tarsnap-autoconf-1.0.35
    # ./configure
    ...
    configure: creating ./config.status
    config.status: creating Makefile
    config.status: creating config.h
    config.status: executing depfiles commands
    #

    If the configure script ends any way other than this, you’re on Linux and didn’t install the necessary development packages. The libraries alone won’t suffice, you must have the development versions.

    If configure completed, run

    # make all install clean

    Tarsnap is now ready to use.

    Start by creating a Tarsnap key for this machine and attaching it to your Tarsnap account. Here I create a key for my machine www.

    # tarsnap-keygen –keyfile /root/tarsnap.key –user mwlucas@michaelwlucas.com –machine pestilence
    Enter tarsnap account password:
    #

    I now have a tarsnap key file. /root/tarsnap.key looks like this:

    # START OF TARSNAP KEY FILE
    dGFyc25hcAAAAAAAAAAzY6MEAAAAAAEAALG8Ix2yYMu+TN6Pj7td2EhjYlGCGrRRknJQ8AeY
    uJsctXIEfurQCOQN5eZFLi8HSCCLGHCMRpM40E6Jc6rJExcPLYkVQAJmd6auGKMWTb5j9gOr
    SeCCEsUj3GzcTaDCLsg/O4dYjl6vb/he9bOkX6NbPomygOpBHqcMOUIBm2eyuOvJ1d9R+oVv
    ...

    This machine is now registered and ready to go.

    This key is important. If your machine is destroyed and you need access to your remote backup, you will need this key! Before you proceed, back it up somewhere other than the machine you’re backing up. There’s lots of advice out there on how to back up private keys. Follow it.

    Now let’s store some backups in the cloud. I’m going to play with my /etc/ directory, because it’s less than 3MB. Start by backing up a single directory.

    # tarsnap -c -f wwwetctest etc/
    Directory /usr/local/tarsnap-cache created for "--cachedir /usr/local/tarsnap-cache"
    Total size Compressed size
    All archives 1996713 382896
    (unique data) 1946025 366495
    This archive 1996713 382896
    New data 1946025 366495

    Nothing seems to happen on the local system. Let’s check and be sure that there’s a backup out in the cloud:

    # tarsnap --list-archives
    wwwetctest

    I then went into /etc and did some cleanup, removing files that shouldn’t have ever been there. This stuff grows in /etc on any long-lived system.

    # tarsnap -c -f wwwetctest-20140716-1508 etc/
    Total size Compressed size
    All archives 3986206 765446
    (unique data) 2120798 403833
    This archive 1989493 382550
    New data 174773 37338

    # tarsnap --list-archives
    wwwetctest
    wwwetctest-20140716-1508

    Note that the compressed size of this archive is much smaller than the first one. Tarsnap only stored the diffs between the two backups.

    If you want more detail about your listed backups, add -v to see the creation date. Add a second -v to see the command used to create the archive.

    # tarsnap --list-archives -vv
    wwwetctest 2014-07-16 15:02:41 tarsnap -c -f wwwetctest etc/
    wwwetctest-20140716-1508 2014-07-16 15:09:38 tarsnap -c -f wwwetctest-20140716-1508 etc/

    Let’s pretend that I need a copy of my backup. Here I extract the newest backup into /tmp/etc.

    # cd /tmp
    # tarsnap -x -f wwwetctest-20140716-1508

    Just for my own amusement, I’ll extract the older backup as well and compare the contents.

    # cd /tmp
    # tarsnap -x -f wwwetctest

    The files I removed during my cleanup are now present.

    What about rotating backups? I now have two backups. The second one is a differential backup against the first. If I blow away the first backup, what happens to the older backup?

    # tarsnap -d -f wwwetctest
    Total size Compressed size
    All archives 1989493 382550
    (unique data) 1938805 366149
    This archive 1996713 382896
    Deleted data 181993 37684

    It doesn’t look like it deleted very much data. And indeed, a check of archive shows that all my files are there.

    And now, the hard part: what do I need to back up? That’s a whole separate class of problem…

  • Cloning a FreeBSD/ZFS Machine with ‘zfs send’

    My employer’s mail server runs DirectAdmin on FreeBSD, with ZFS. The mail server is important to the company. We want to be able to restore it quickly and easily. While we back it up regularly, having a “known good” base system with all the software installed, where we can restore the mail spools and account information, would be good.

    As it runs ZFS, let’s send the filesystems across the network to a blank machine.

    First I need an installation ISO for the same release and architecture of FreeBSD. (Could I use a different release? Perhaps. But if I have any errors, the first thing to try will be the correct installation media. Let’s skip the preliminaries.)

    I also need a web server, to store a user’s SSH public key file.

    I provision a new virtual machine with exactly the same amount of disk, memory, and processor as the original.

    Boot the install disk. Choose “Live CD” rather than install. Log in as root (no password).

    For FreeBSD 9.2 or earlier, take care of bug 168314. The live CD can’t configure resolv.conf out of the box because the directory /tmp/bsdinstall_etc isn’t present and /etc/resolv.conf is a symlink to /tmp/bsdinstall_etc/resolv.conf.

    # mkdir /tmp/bsdinstall_etc
    # touch /tmp/bsdinstall_etc/resolv.conf

    My disk is /dev/vtbd0. I format this disk exactly like the original server. The original install was scripted, which saves me the trouble of copying the partitioning from the original machine.

    # gpart destroy -F vtbd0
    # gpart create -s gpt vtbd0
    # gpart add -s 222 -a 4k -t freebsd-boot -l boot0 vtbd0
    # gpart add -s 8g -a 4k -t freebsd-swap -l swap0 vtbd0
    # gpart add -a 4k -t freebsd-zfs -l disk0 vtbd0
    # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 vtbd0
    # gnop create -S 4096 /dev/gpt/disk0
    # kldload zfs
    # zpool create -f -o altroot=/mnt -O canmount=off -m none zroot /dev/gpt/disk0.nop
    # zfs set checksum=fletcher4 zroot

    Our destination filesystem is all set. But we’ll need an SSH daemon to receive connections. The only writable directories are /tmp and /var, so I create a /tmp/sshd_config that contains:

    HostKey /tmp/ssh_host_ecdsa_key
    PermitRootLogin yes
    AuthorizedKeysFile /tmp/%u

    I need a host key. ssh-keygen has a -A flag to automatically create host keys, but it tries to put them in /etc/ssh. If you want to create keys in another location, you must create them manually.

    # ssh-keygen -f ssh_host_ecdsa_key -N '' -t ecdsa

    Now start sshd and the network.

    # /usr/sbin/sshd -f /tmp/sshd_config
    # dhclient vtnet0

    Your original machine should now be able to ping and SSH to the blank host. The blank host should be able to ping to your web server.

    This is also the point where you realize that the DHCP configured address is actually in use by another machine, and you need to reboot and start over.

    Copy an authorized_keys file from your web server to /tmp/root. Also set the permissions on /tmp so that sshd will accept that key file. (The sticky bit will make sshd reject the key file.)

    # fetch http://webserver/authorized_keys .
    # mv authorized_keys root
    # chmod 755 /tmp

    At this point, the original machine should be able to SSH into the target machine as root. This was the hard part. Now recursively copy the ZFS snapshots across the network.

    # zfs snapshot -r zroot@backup
    zfs send -R zroot@backup | ssh root@newhost zfs recv -F zroot

    Now I walk away and let gigs and gigs of data flow across the WAN. To check progress, go to the target machine and run:

    # zfs list -t snapshot

    When an individual snapshot finishes, it’ll appear on the list.

    Once the ZFS send completes, you can reboot and have a cloned system, right?

    Uh, not quite. (I suspect that all of the below could have been done before the previous reboot, but I didn’t know I had to do it. Readers, please do the below before your first reboot and let me know if it works or not.)

    Reboot at this stage and you’ll get messages like:

    Can't find /boot/zfsloader

    Can't find /boot/kernel

    What gives? For one thing, there’s no boot loader yet. I could have done this earlier, but I didn’t think of it. Boot back into the live CD and install the necessary loaders.

    # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 vtbd0
    bootcode written to vtbd0

    While you’re in the live CD, check the ZFS pool.

    # zpool status
    no pools available

    Hang on… I know there’s a pool here! I just saw it before rebooting. Running zpool import shows my zroot pool as an option, so let’s try it.

    # zpool import zroot
    cannot import 'zroot': pool may be in use from other system
    use '-f' to import anyway

    We know this pool isn’t in use. Let’s forcibly import it. As I’m booting off the live CD, I temporarily mount this at /mnt.

    # zpool import -o altroot=/mnt -f zroot

    Now that you can talk to the pool, you can tell FreeBSD which ZFS it should boot from.

    # zpool set bootfs=zroot/DEFAULT/root zroot

    I want to be sure this pool can import and export cleanly, so I do:

    # zpool export zroot
    # zpool import zroot

    At this point, I’ve mounted the ZFS filesystems over the live CD filesystem. Don’t mess around with it any more, just reboot.

    And your system is cloned.

    Now I get to figure out how to update this copy with incremental snapshots. But that’ll be a separate blog post.

    mfiutil on FreeBSD

    I need to add drives to one of my FreeNAS 8.3.1 boxes. This machine has an “Intel RAID” card in it. I don’t want to use the Intel RAID, I want just a bunch of disks that I can plop a mirror on top of. The BIOS utility doesn’t give me the “just a bunch of disks” option. So I boot into FreeNAS, insert the drives, and the console shows:

    Jul 10 10:25:40 datastore5 kernel: mfi0: 6904 (458317539s/0x0002/info) - Inserted: PD 0e(e0xfc/s0)
    Jul 10 10:25:40 datastore5 kernel: mfi0: MFI_DCMD_PD_LIST_QUERY failed 2
    Jul 10 10:25:40 datastore5 kernel: mfi0: 6905 (458317539s/0x0002/info) - Inserted: PD 0e(e0xfc/s0) Info
    Jul 10 10:29:13 datastore5 kernel: mfi0: 6906 (458317752s/0x0002/info) - Inserted: PD 0f(e0xfc/s1)
    Jul 10 10:29:13 datastore5 kernel: mfi0: MFI_DCMD_PD_LIST_QUERY failed 2
    Jul 10 10:29:13 datastore5 kernel: mfi0: 6907 (458317752s/0x0002/info) - Inserted: PD 0f(e0xfc/s1) Info

    This “Intel RAID” is a MFI card, manageable with mfiutil(8). So let’s see what this adapter has to say for itself.

    # mfiutil show adapter
    mfi0 Adapter:
    Product Name: Intel(R) RAID Controller SRCSASBB8I
    Serial Number: P102404308
    Firmware: 8.0.1-0029
    RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
    Battery Backup: not present
    NVRAM: 32K
    Onboard Memory: 256M
    Minimum Stripe: 8k
    Maximum Stripe: 1M

    This card can do JBOD (Just a Bunch Of Disks). Tell the controller about them:

    # mfiutil create jbod 14
    # mfiutil create jbod 15

    We now have two new online drives. (I could also use the controller to create a mirror with these drives, or add more drives and have a RAID-5, or whatever I need.)

    # mfiutil show drives
    mfi0 Physical Drives:
    14 ( 1863G) ONLINE SATA E1:S0
    15 ( 1863G) ONLINE SATA E1:S1

    These show up as /dev/mfidX. FreeNAS can now see the drives.

    For FreeNAS, ZFS, and software RAID, this kind of hardware RAID controller isn’t really desirable. While the RAID controller claims that these drives are “just a bunch of disks,” the controller inserts itself in between the kernel and the drives. This means that certain error detection and correction, as well as SMART tools, won’t work. But it’s what I have in this old box, so I’ll use these drives for less vital data.

    FreeBSD Disk Partitioning

    A couple weeks ago, I monopolized the freebsd-hackers mailing list by asking a couple simple, innocent questions about managing disks using gpart(8) instead of the classic fdisk(8) and disklabel(8). This is my attempt to rationalize and summarize a small cup of the flood of information I received.

    The FreeBSD kernel understands several different disk partitioning schemes, including the traditional x86 MBR (slices), the current GPT, BSD disklabels, as well as schemes from Apple, Microsoft, NEC (PC98), and Sun. The gpart(8) tool is intended as a generic interface that lets you manage disk partitioning in all of these schemes, and abstract away all of the innards in favor of saying “Use X partitioning system on this disk, and put these partitions on it.” It’s a great goal.

    FreeBSD storage, and computing storage in general, is in a transitory state today. The older tools, fdisk and bsdlabel, aren’t exactly deprecated but they are not encouraged. x86 hardware is moving towards GPT, but there’s an awful lot of MBR-only gear deployed. Disks themselves are moving from the long-standing 512B sector size to 4KB, eight times larger.

    And as gravy on top of all this, disks lie about their sector size. Because why wouldn’t they?

    Traditional disks have a geometry defined by cylinders, heads, and sectors. MBR-style partitions (aka slices) are expected to end on cylinder boundaries — that is, slices are measured in full cylinders. Cylinders and heads aren’t really relevant to modern disks–they use LBA. With flash drives, SSDs, and whatever other sort of storage people come up with, disk geometry is increasingly obsolete. The traditional MBR partitions expect to run on top of cylinder-based partitions, however, and any BSD throws a wobbler if you use a MBR partition that doesn’t respect this.

    gpart must handle those traditional cylinder boundaries as well as partitioning schemes without such boundaries. If you create a 1GB MBR partition, it will round the size to the nearest cylinder.

    The sector size changes create orthogonal problems. If you write to the disk in 512B sectors, but the underlying disk has 4K sectors, the disk will perform many more writes than necessary. If you write to the disk in 4K sectors, but the underlying disk uses 512B sectors, there’s no real harm done.

    But if your logical 4K sectors don’t line up with the disk’s physical 4K sectors, performance will drop in half.

    For best results, create all partitions aligned to a 4K sector. If the underlying disk has 512B sectors, it won’t matter; you must do more writes to fill those sectors anyway. Use the -a 4k arguments with gpart to have created partitions aligned to 4K sectors.

    How do you do this? It depends on if you’re using GPT or MBR partitions.

    For GPT partitions, you must start partitioning the disk at a multiple of 4K. The front of your disk might have all kinds of boot code or boot managers in it, however. Start your first partition at the 1MB mark, and only create partitions that are even multiples of a megabyte. Today you’d have to go out of your way to create a partitions that was 1.5MB, so this isn’t a huge constraint.

    For MBR partitions, it’s slightly more difficult. Use the -a 4k command-line arguments to gpart when creating BSD partitions inside a MBR slice. This tells gpart that even if the slice isn’t 4k aligned, the BSD partitions must be.

    I could put a bunch of instructions here, but Warren Block has a nice detailed walk-through of the actual commands used to partition disks with these standards.

    next book(s): FreeBSD storage

    I’m writing about FreeBSD disk and storage management. (The folks on my mailing list already knew this.) For the last few months, I’ve been trying to assimilate and internalize GEOM.

    I’ve always used GEOM in a pretty straightforward: decide what I want to achieve, read a couple man pages, find an archived discussion where someone achieved my goal, blindly copy their commands, and poof! I have deployed an advanced GEOM feature. GEOM was mostly for developers who invented cool new features.

    Turns out that GEOM is for systems administrators. It lets us do all sorts of cool things.

    GEOM is complicated because the world is complicated. It lets you configure your storage any way you like, which is grand. But in general, I’ve approached GEOM like I would any other harmless-looking but deadly thing. Now I’m using a big multi-drive desktop from iX Systems to fearlessly test GEOM to destruction.

    I’m learning a lot. The GEOM book will be quite useful. But it’s taking longer than I thought. Everything else flows out of GEOM. I’ve written some non-GEOM parts, but I’m holding off writing anything built on top of GEOM. Writing without understanding means rewriting, and rewriting leads to fewer books.

    My GEOM comprehension is expanding, and many developers are giving me very good insight into the system. GEOM is an underrated feature, and I think my work will help people understand just how powerful it is and what a good selling point it is for FreeBSD.

    My research has gone as far as the man pages can take me. Now I need to start pestering the mailing lists for answers. Apparently my innocuous questions can blow up mailing lists. I would apologize, but an apology might imply that I won’t do it again.

    FreeBSD storage is a big topic. I suspect it’s going to wind up as three books: one on GEOM and UFS, one on ZFS, and one on networked storage. I wouldn’t be shocked if I can get it into two. I would be very surprised if it takes four. (I’m assuming each book is roughly the size of SSH Mastery — people appear to like that length and price point.) I will adjust book lengths and prices as needed to make them a good value.

    The good thing with releasing multiple books is that you only need buy the ones you need. You need to learn about iSCSI and NFS? Buy the third book. You want everything but ZFS? Skip that one. And so on.

    As I don’t know the final number of books or how they will be designed, I’m not planning an advance purchase program.

    I am planning to release all the books almost simultaneously, or at least very close together.

    So, a mini-FAQ:

  • When will they be released?
    When I’m done writing them.

  • How much will they cost?
    Dunno.

  • How many will there be?
    “Five.” “Three, sir.” Or four. Or two. Definitely a positive integer.

  • Do you know anything?
    I like pie.

    I’m pondering how to give back to FreeBSD on this project.

    I auctioned off the first copy of Absolute FreeBSD to support the FreeBSD Foundation. That raised $600 and was rather fun. These books will be print-on-demand, though, so “first print” is a little more ambiguous. It also has a ceiling, where OpenBSD’s ongoing SSH Mastery sales keep giving.

    I’ve had tentative discussions with Ed Maste over at the FreeBSD Foundation about using those books as fundraisers. I’d let the FF have the books at my cost, and they could include them as rewards for larger donations. A million and ten things could go wrong with that, so it might not work out. If nothing else, shipping stuff is a lot of work, and the FF folks might decide that their time is better spent knocking on big corporate doors than playing PBS. I couldn’t blame them — that’s why I don’t ship paper books.

    If that fails for whatever reason, I’ll sponsor a FreeBSD devsummit or something.