a survey of FreeBSD ZFS snapshot automation tools

Why automatically snapshot filesystems? Because snapshots let you magically fall back to older versions of files and even the operating system. Taking a manual snapshot before a system upgrade is laudable, but you need to easily recover files when everything goes bad. So I surveyed my Twitter followers to see what FreeBSD ZFS snapshot automation tools they use.

The tools:

  • A few people use custom shell scripts of varying reliability and flexibility. I’m not going to write my own shell script. The people who write canned snapshot rotation tools have solved this problem, and I have no desire to re-solve it myself.
  • One popular choice was sysutils/zfs-snapshot-mgmt. This lets you create snapshots as often as once per minute, and retain them as long as you desire. Once a minute is a bit much for me. You can group snapshot creation and deletion pretty much arbitrarily, letting you keep, say, 867 per-minute snapshots, 22 every-seven-minute snapshots, and 13 monthlies, if that’s what you need. This is the Swiss army knife of ZFS snapshot tools. One possible complication with zfs-snapshot-mgmt is that it is written in Ruby and configured in YAML. If you haven’t seen YAML yet, you will–it’s an increasingly popular configuration syntax. My existing automation is all in shell and Perl, however. I added Python for Ansible. Adding yet another interpreter to all of my ZFS systems doesn’t thrill me. Ruby is not a show-stopper, but it doesn’t thrill me. The FreeBSD port is outdated, however–the web site referenced by the port says that the newest code, with bug fixes, is on github. If you’re looking for a FreeBSD porting project, this would be an easy one.
  • The zfs-periodic web page is down. NEC Energy Solutions owns the domain, so I’m guessing that the big corporate overlord claimed the blog and the site isn’t not coming back. The code still lives at various mirrors, however. zfs-periodic is tightly integrated with FreeBSD’s periodic system, and can automatically create and delete hourly, daily, monthly, and weekly snapshots. It appears to be the least flexible of the snapshot systems, as it runs with periodic. If you want to take your snapshots at a time that periodic doesn’t run, too bad. I don’t get a very good feeling from zfs-periodic–if the code had an owner, it would have a web site somewhere.
  • sysutils/zfsnap can do hourly, daily, weekly, and monthly snapshots. It’s designed to run from periodic(8) or cron(8), and is written in /bin/sh.
  • sysutils/zfstools includes a clone of OpenSolaris’ automatic snapshotting tools. I no longer run OpenSolaris-based systems, except on legacy servers that I’m slowly removing, but I never know what the future holds around the dayjob. (I’m waiting for the mission-critical Xenix deployment, I’m sure it’s not far off.) This looks highly flexible, being configured by a combination of cron scripts and ZFS attributes, and can snapshot every 15 minutes, hour, day, week, and month. It’s written in Ruby (yet another scripting language on my system? Oh, joy. Joy and rapture.) On the plus side, the author of zfstools is also a FreeBSD committer, so I can expect him to keep the port up to date.

    In doing this survey I also came across sysutils/freebsd-snapshot, a tool for automatically scheduling and automounting UFS snapshots. While I’m not interested in UFS snapshots right now, this is certainly worth remembering.

    My choice?

    So, which ones will I try? I want a tool that’s still supported and has some flexibility. I want a FreeBSD-provided package of the current version of the software. I’m biased against adding another scripting language to my systems, but that’s not veto-worthy.

    If I want compatibility with OpenSolaris, I’ll use zfstools. I get another scripting language, yay!

    If I don’t care about OpenSolaris-derived systems, zfsnap is the apparent winner.

    Of course, I won’t know which is better until I try both… which will be the topic of a couple more blogs.

    UPDATE, 07-31-2014: I screwed up my research on zfsnap. I have rewritten that part of the article, and my conclusions. My apologies — that’s what happens when you try to do research after four hours sleep. Thanks to Erwin Lansing for pointing it out.

    (“Gee, I’m exhausted. Better not touch any systems today. What shall I do? I know, research and a blog post!” Sheesh.)

  • Google Play notes

    A couple months ago, I put my Tilted Windmill Press books up on Google Play. I firmly believe that having your books widely available is a good thing. Google Play let me be DRM-free, and while their discounting system is a pain to work around, I’d like people to be able to get my books easily. I’ve sold six books through Google Play, which isn’t great but hey, it’s six readers I wouldn’t have otherwise.

    Amazon is overwhelmingly my biggest reseller. I get over 90% of my self-publishing income from them. They provide truly impressive analytical tools. While sites like Smashwords provide you with spreadsheets that you can dump into whatever analytics tools you want, Amazon gives you the spreadsheets and a bunch of graphs and charts and other cool stuff.

    This made it really obvious that a day after my books went live on Google Play, my Amazon sales plummeted by about a third and have remained there.

    This is weird. And I really would like my sales back up where they were.

    I can think of lots of explanations, most of them involving computer algorithms. No conspiracy is required here. I’m certain Amazon didn’t de-prioritize my books just because they’re available on Google Play. Book sales fluctuate naturally, and there usually is a dip during the summer. But the graphs (both Amazon’s and my own) makes it really clear that this is an unusual slump.

    As an experiment, I’ve disabled my books in Google Play. People who bought the book will still have access to it, but nobody can purchase it now.

    If my Amazon sales recover, the Google Play store will remain off. The few Play sales don’t make up for the lost Amazon sales.

    I will report back on the results. But, if you’re wondering where my Google Play store went, the answer is: away.

    FreeBSD Mastery: Storage Essentials – discount pre-pub available

    You can now buy my next tech book, FreeBSD Mastery: Storage Essentials, for $7.99.

    This is an incomplete book. It has not been tech reviewed. The advice in it might eat your disks and sell your soul to a Manhattan hot dog vendor for use as a dish cloth. So think of it as a discount pre-order, or your opportunity to correct one of my books before it goes to print.

    I will have a tech review done when the first draft is complete.

    I had not originally planned to do pre-orders, but I’m now comfortable enough with the topic that I think I can do so without totally humiliating myself. Worse than usual, that is.

    And if you were on my mailing list, you would have known this earlier.

    Installing and Using Tarsnap for Fun and Profit

    Well, “profit” is a strong word. Maybe “not losing money” would be a better description. Perhaps even “not screwing over readers.”

    I back up my personal stuff with a combination of snapshots, tarballs, rsync, and sneakernet. This is fine for my email and my personal web site. Chances are, if all four of my backup sites are simultaneously destroyed, I won’t care.

    A couple years ago I opened my own ecommerce site, so I could sell my self-published books directly to readers. For the record, I didn’t expect Tilted Windmill Press direct sales to actually, y’know, go anywhere. I didn’t expect that people would buy books directly from me when they could just go to their favorite ebookstore, hit a button, and have the book miraculously appear on their ereader.

    I was wrong. People buy books from me. Once every month or two, someone even throws a few bucks in the tip jar, or flat-out overpays for their books. I am pleasantly surprised.

    So: I was wrong about self-publishing, and now I was wrong about author direct sales. Pessimism is grand, because you’re either correct or you get a pleasant surprise. Thank you all.

    But now I find myself in a position where I actually have commercially valuable data, and I need to back it up. Like a real business. I need offsite backups. I need them automated. And I need to be able to recover them, so that people who have bought my books can continue to download them, in the off chance that the Detroit area is firebombed off the Earth while I’m at BSDCan.

    So it’s time for Tarsnap.

    Why Tarsnap?

  • It works very much like tar, so I don’t have to learn any new command-line arguments. (If you’re not familiar with tar, you need to be.)
  • The terms of service are readable by human beings and more reasonable than other backup services.
  • The code is open and auditable.
  • When Tarsnap’s author screws up, he admits it and handles it correctly.
  • It’s cheap. Any backup priced in picodollars gets my attention.

    I also see the author regularly at regular BSD conferences. I can slap him in person if he does anything truly daft.

    Tarsnap has a quick Getting Started page. We’ll do the easy things first. Sign up for a Tarsnap account. Once your account is active, put some money in it–$5 will suffice.

    Now let’s check your prerequisites. You need:

  • GnuPG

    BSD systems come with everything else you need.

    Linux users must install

  • a compiler, like gcc or clang
  • make
  • OpenSSL (including header files)
  • zlib (including header files)
  • System header files
  • OpenSSL header files
  • The ext2fs/ext2_fs.h (not the linux/ext2_fs.h header)

    The Tarsnap download page lists specific packages for Debian-based and Red Hat-based Linuxes.

    Go to the download page and get both the source code and the signed hash file. Tarsnap is only available as source code, so that you can verify the code integrity yourself. So let’s do that.

    Start by using GnuPG to verify the integrity of the Tarsnap code. If you’re not familiar with GnuPG and OpenPGP, some daftie wrote a whole book on PGP & GPG. Once you install GnuPG, run the gpg command to get the configuration files.

    # gpg
    gpg: directory `/home/mwlucas/.gnupg' created
    gpg: new configuration file `/home/mwlucas/.gnupg/gpg.conf' created
    gpg: WARNING: options in `/home/mwlucas/.gnupg/gpg.conf' are not yet active during this run
    gpg: keyring `/home/mwlucas/.gnupg/secring.gpg' created
    gpg: keyring `/home/mwlucas/.gnupg/pubring.gpg' created
    gpg: Go ahead and type your message ...
    ^C
    gpg: signal Interrupt caught ... exiting

    Hit ^C. I just wanted the configuration and key files.

    Now edit $HOME/.gnupg/gpg.conf. Set the following options.

    keyserver hkp://keys.gnupg.net
    keyserver-options auto-key-retrieve

    See if our GPG client can verify the signature file tarsnap-sigs-1.0.35.asc.

    # gpg --decrypt tarsnap-sigs-1.0.35.asc
    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a
    gpg: Signature made Sun Feb 16 23:20:35 2014 EST using RSA key ID E5979DF7
    gpg: Good signature from "Tarsnap source code signing key (Colin Percival) " [unknown]
    gpg: WARNING: This key is not certified with a trusted signature!
    gpg: There is no indication that the signature belongs to the owner.
    Primary key fingerprint: 634B 377B 46EB 990B 58FF EB5A C8BF 43BA E597 9DF7

    Some interesting things here. The most important line here is the statement ‘Good signature from “Tarsnap source code signing key”.’ Your GPG program grabbed the source code signing key from a public key server and used it to verify that the signature file is not tampered with.

    As you’re new to OpenPGP, this is all you can do. You’re not attached to the Web of Trust, so you can’t verify the signature chain. (I do recommend that you get an OpenPGP key and collect a few signatures, so you can verify code signatures if nothing else.)

    Now that we know the signature file is good, we can use the cryptographic hash in the file to validate that the tarsnap code we downloaded is what the Tarsnap author intended. Near the top of the signature file you’ll see the line:

    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a

    Use the sha256(1) program (or sha256sum, or shasum -a 256, or whatever your particular Unix calls the SHA-256 checksum generator) to verify the source code’s integrity.

    # sha256 tarsnap-autoconf-1.0.35.tgz
    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a

    The checksum in the signature file and the checksum you compute match. You have valid source code, and can proceed.

    Extract the source code.

    # tar -xf tarsnap-autoconf-1.0.35.tgz
    # cd tarsnap-autoconf-1.0.35
    # ./configure
    ...
    configure: creating ./config.status
    config.status: creating Makefile
    config.status: creating config.h
    config.status: executing depfiles commands
    #

    If the configure script ends any way other than this, you’re on Linux and didn’t install the necessary development packages. The libraries alone won’t suffice, you must have the development versions.

    If configure completed, run

    # make all install clean

    Tarsnap is now ready to use.

    Start by creating a Tarsnap key for this machine and attaching it to your Tarsnap account. Here I create a key for my machine www.

    # tarsnap-keygen –keyfile /root/tarsnap.key –user mwlucas@michaelwlucas.com –machine pestilence
    Enter tarsnap account password:
    #

    I now have a tarsnap key file. /root/tarsnap.key looks like this:

    # START OF TARSNAP KEY FILE
    dGFyc25hcAAAAAAAAAAzY6MEAAAAAAEAALG8Ix2yYMu+TN6Pj7td2EhjYlGCGrRRknJQ8AeY
    uJsctXIEfurQCOQN5eZFLi8HSCCLGHCMRpM40E6Jc6rJExcPLYkVQAJmd6auGKMWTb5j9gOr
    SeCCEsUj3GzcTaDCLsg/O4dYjl6vb/he9bOkX6NbPomygOpBHqcMOUIBm2eyuOvJ1d9R+oVv
    ...

    This machine is now registered and ready to go.

    This key is important. If your machine is destroyed and you need access to your remote backup, you will need this key! Before you proceed, back it up somewhere other than the machine you’re backing up. There’s lots of advice out there on how to back up private keys. Follow it.

    Now let’s store some backups in the cloud. I’m going to play with my /etc/ directory, because it’s less than 3MB. Start by backing up a single directory.

    # tarsnap -c -f wwwetctest etc/
    Directory /usr/local/tarsnap-cache created for "--cachedir /usr/local/tarsnap-cache"
    Total size Compressed size
    All archives 1996713 382896
    (unique data) 1946025 366495
    This archive 1996713 382896
    New data 1946025 366495

    Nothing seems to happen on the local system. Let’s check and be sure that there’s a backup out in the cloud:

    # tarsnap --list-archives
    wwwetctest

    I then went into /etc and did some cleanup, removing files that shouldn’t have ever been there. This stuff grows in /etc on any long-lived system.

    # tarsnap -c -f wwwetctest-20140716-1508 etc/
    Total size Compressed size
    All archives 3986206 765446
    (unique data) 2120798 403833
    This archive 1989493 382550
    New data 174773 37338

    # tarsnap --list-archives
    wwwetctest
    wwwetctest-20140716-1508

    Note that the compressed size of this archive is much smaller than the first one. Tarsnap only stored the diffs between the two backups.

    If you want more detail about your listed backups, add -v to see the creation date. Add a second -v to see the command used to create the archive.

    # tarsnap --list-archives -vv
    wwwetctest 2014-07-16 15:02:41 tarsnap -c -f wwwetctest etc/
    wwwetctest-20140716-1508 2014-07-16 15:09:38 tarsnap -c -f wwwetctest-20140716-1508 etc/

    Let’s pretend that I need a copy of my backup. Here I extract the newest backup into /tmp/etc.

    # cd /tmp
    # tarsnap -x -f wwwetctest-20140716-1508

    Just for my own amusement, I’ll extract the older backup as well and compare the contents.

    # cd /tmp
    # tarsnap -x -f wwwetctest

    The files I removed during my cleanup are now present.

    What about rotating backups? I now have two backups. The second one is a differential backup against the first. If I blow away the first backup, what happens to the older backup?

    # tarsnap -d -f wwwetctest
    Total size Compressed size
    All archives 1989493 382550
    (unique data) 1938805 366149
    This archive 1996713 382896
    Deleted data 181993 37684

    It doesn’t look like it deleted very much data. And indeed, a check of archive shows that all my files are there.

    And now, the hard part: what do I need to back up? That’s a whole separate class of problem…

  • Cloning a FreeBSD/ZFS Machine with ‘zfs send’

    My employer’s mail server runs DirectAdmin on FreeBSD, with ZFS. The mail server is important to the company. We want to be able to restore it quickly and easily. While we back it up regularly, having a “known good” base system with all the software installed, where we can restore the mail spools and account information, would be good.

    As it runs ZFS, let’s send the filesystems across the network to a blank machine.

    First I need an installation ISO for the same release and architecture of FreeBSD. (Could I use a different release? Perhaps. But if I have any errors, the first thing to try will be the correct installation media. Let’s skip the preliminaries.)

    I also need a web server, to store a user’s SSH public key file.

    I provision a new virtual machine with exactly the same amount of disk, memory, and processor as the original.

    Boot the install disk. Choose “Live CD” rather than install. Log in as root (no password).

    For FreeBSD 9.2 or earlier, take care of bug 168314. The live CD can’t configure resolv.conf out of the box because the directory /tmp/bsdinstall_etc isn’t present and /etc/resolv.conf is a symlink to /tmp/bsdinstall_etc/resolv.conf.

    # mkdir /tmp/bsdinstall_etc
    # touch /tmp/bsdinstall_etc/resolv.conf

    My disk is /dev/vtbd0. I format this disk exactly like the original server. The original install was scripted, which saves me the trouble of copying the partitioning from the original machine.

    # gpart destroy -F vtbd0
    # gpart create -s gpt vtbd0
    # gpart add -s 222 -a 4k -t freebsd-boot -l boot0 vtbd0
    # gpart add -s 8g -a 4k -t freebsd-swap -l swap0 vtbd0
    # gpart add -a 4k -t freebsd-zfs -l disk0 vtbd0
    # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 vtbd0
    # gnop create -S 4096 /dev/gpt/disk0
    # kldload zfs
    # zpool create -f -o altroot=/mnt -O canmount=off -m none zroot /dev/gpt/disk0.nop
    # zfs set checksum=fletcher4 zroot

    Our destination filesystem is all set. But we’ll need an SSH daemon to receive connections. The only writable directories are /tmp and /var, so I create a /tmp/sshd_config that contains:

    HostKey /tmp/ssh_host_ecdsa_key
    PermitRootLogin yes
    AuthorizedKeysFile /tmp/%u

    I need a host key. ssh-keygen has a -A flag to automatically create host keys, but it tries to put them in /etc/ssh. If you want to create keys in another location, you must create them manually.

    # ssh-keygen -f ssh_host_ecdsa_key -N '' -t ecdsa

    Now start sshd and the network.

    # /usr/sbin/sshd -f /tmp/sshd_config
    # dhclient vtnet0

    Your original machine should now be able to ping and SSH to the blank host. The blank host should be able to ping to your web server.

    This is also the point where you realize that the DHCP configured address is actually in use by another machine, and you need to reboot and start over.

    Copy an authorized_keys file from your web server to /tmp/root. Also set the permissions on /tmp so that sshd will accept that key file. (The sticky bit will make sshd reject the key file.)

    # fetch http://webserver/authorized_keys .
    # mv authorized_keys root
    # chmod 755 /tmp

    At this point, the original machine should be able to SSH into the target machine as root. This was the hard part. Now recursively copy the ZFS snapshots across the network.

    # zfs snapshot -r zroot@backup
    zfs send -R zroot@backup | ssh root@newhost zfs recv -F zroot

    Now I walk away and let gigs and gigs of data flow across the WAN. To check progress, go to the target machine and run:

    # zfs list -t snapshot

    When an individual snapshot finishes, it’ll appear on the list.

    Once the ZFS send completes, you can reboot and have a cloned system, right?

    Uh, not quite. (I suspect that all of the below could have been done before the previous reboot, but I didn’t know I had to do it. Readers, please do the below before your first reboot and let me know if it works or not.)

    Reboot at this stage and you’ll get messages like:

    Can't find /boot/zfsloader

    Can't find /boot/kernel

    What gives? For one thing, there’s no boot loader yet. I could have done this earlier, but I didn’t think of it. Boot back into the live CD and install the necessary loaders.

    # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 vtbd0
    bootcode written to vtbd0

    While you’re in the live CD, check the ZFS pool.

    # zpool status
    no pools available

    Hang on… I know there’s a pool here! I just saw it before rebooting. Running zpool import shows my zroot pool as an option, so let’s try it.

    # zpool import zroot
    cannot import 'zroot': pool may be in use from other system
    use '-f' to import anyway

    We know this pool isn’t in use. Let’s forcibly import it. As I’m booting off the live CD, I temporarily mount this at /mnt.

    # zpool import -o altroot=/mnt -f zroot

    Now that you can talk to the pool, you can tell FreeBSD which ZFS it should boot from.

    # zpool set bootfs=zroot/DEFAULT/root zroot

    I want to be sure this pool can import and export cleanly, so I do:

    # zpool export zroot
    # zpool import zroot

    At this point, I’ve mounted the ZFS filesystems over the live CD filesystem. Don’t mess around with it any more, just reboot.

    And your system is cloned.

    Now I get to figure out how to update this copy with incremental snapshots. But that’ll be a separate blog post.

    mfiutil on FreeBSD

    I need to add drives to one of my FreeNAS 8.3.1 boxes. This machine has an “Intel RAID” card in it. I don’t want to use the Intel RAID, I want just a bunch of disks that I can plop a mirror on top of. The BIOS utility doesn’t give me the “just a bunch of disks” option. So I boot into FreeNAS, insert the drives, and the console shows:

    Jul 10 10:25:40 datastore5 kernel: mfi0: 6904 (458317539s/0x0002/info) - Inserted: PD 0e(e0xfc/s0)
    Jul 10 10:25:40 datastore5 kernel: mfi0: MFI_DCMD_PD_LIST_QUERY failed 2
    Jul 10 10:25:40 datastore5 kernel: mfi0: 6905 (458317539s/0x0002/info) - Inserted: PD 0e(e0xfc/s0) Info
    Jul 10 10:29:13 datastore5 kernel: mfi0: 6906 (458317752s/0x0002/info) - Inserted: PD 0f(e0xfc/s1)
    Jul 10 10:29:13 datastore5 kernel: mfi0: MFI_DCMD_PD_LIST_QUERY failed 2
    Jul 10 10:29:13 datastore5 kernel: mfi0: 6907 (458317752s/0x0002/info) - Inserted: PD 0f(e0xfc/s1) Info

    This “Intel RAID” is a MFI card, manageable with mfiutil(8). So let’s see what this adapter has to say for itself.

    # mfiutil show adapter
    mfi0 Adapter:
    Product Name: Intel(R) RAID Controller SRCSASBB8I
    Serial Number: P102404308
    Firmware: 8.0.1-0029
    RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
    Battery Backup: not present
    NVRAM: 32K
    Onboard Memory: 256M
    Minimum Stripe: 8k
    Maximum Stripe: 1M

    This card can do JBOD (Just a Bunch Of Disks). Tell the controller about them:

    # mfiutil create jbod 14
    # mfiutil create jbod 15

    We now have two new online drives. (I could also use the controller to create a mirror with these drives, or add more drives and have a RAID-5, or whatever I need.)

    # mfiutil show drives
    mfi0 Physical Drives:
    14 ( 1863G) ONLINE SATA E1:S0
    15 ( 1863G) ONLINE SATA E1:S1

    These show up as /dev/mfidX. FreeNAS can now see the drives.

    For FreeNAS, ZFS, and software RAID, this kind of hardware RAID controller isn’t really desirable. While the RAID controller claims that these drives are “just a bunch of disks,” the controller inserts itself in between the kernel and the drives. This means that certain error detection and correction, as well as SMART tools, won’t work. But it’s what I have in this old box, so I’ll use these drives for less vital data.

    FreeBSD Disk Partitioning

    A couple weeks ago, I monopolized the freebsd-hackers mailing list by asking a couple simple, innocent questions about managing disks using gpart(8) instead of the classic fdisk(8) and disklabel(8). This is my attempt to rationalize and summarize a small cup of the flood of information I received.

    The FreeBSD kernel understands several different disk partitioning schemes, including the traditional x86 MBR (slices), the current GPT, BSD disklabels, as well as schemes from Apple, Microsoft, NEC (PC98), and Sun. The gpart(8) tool is intended as a generic interface that lets you manage disk partitioning in all of these schemes, and abstract away all of the innards in favor of saying “Use X partitioning system on this disk, and put these partitions on it.” It’s a great goal.

    FreeBSD storage, and computing storage in general, is in a transitory state today. The older tools, fdisk and bsdlabel, aren’t exactly deprecated but they are not encouraged. x86 hardware is moving towards GPT, but there’s an awful lot of MBR-only gear deployed. Disks themselves are moving from the long-standing 512B sector size to 4KB, eight times larger.

    And as gravy on top of all this, disks lie about their sector size. Because why wouldn’t they?

    Traditional disks have a geometry defined by cylinders, heads, and sectors. MBR-style partitions (aka slices) are expected to end on cylinder boundaries — that is, slices are measured in full cylinders. Cylinders and heads aren’t really relevant to modern disks–they use LBA. With flash drives, SSDs, and whatever other sort of storage people come up with, disk geometry is increasingly obsolete. The traditional MBR partitions expect to run on top of cylinder-based partitions, however, and any BSD throws a wobbler if you use a MBR partition that doesn’t respect this.

    gpart must handle those traditional cylinder boundaries as well as partitioning schemes without such boundaries. If you create a 1GB MBR partition, it will round the size to the nearest cylinder.

    The sector size changes create orthogonal problems. If you write to the disk in 512B sectors, but the underlying disk has 4K sectors, the disk will perform many more writes than necessary. If you write to the disk in 4K sectors, but the underlying disk uses 512B sectors, there’s no real harm done.

    But if your logical 4K sectors don’t line up with the disk’s physical 4K sectors, performance will drop in half.

    For best results, create all partitions aligned to a 4K sector. If the underlying disk has 512B sectors, it won’t matter; you must do more writes to fill those sectors anyway. Use the -a 4k arguments with gpart to have created partitions aligned to 4K sectors.

    How do you do this? It depends on if you’re using GPT or MBR partitions.

    For GPT partitions, you must start partitioning the disk at a multiple of 4K. The front of your disk might have all kinds of boot code or boot managers in it, however. Start your first partition at the 1MB mark, and only create partitions that are even multiples of a megabyte. Today you’d have to go out of your way to create a partitions that was 1.5MB, so this isn’t a huge constraint.

    For MBR partitions, it’s slightly more difficult. Use the -a 4k command-line arguments to gpart when creating BSD partitions inside a MBR slice. This tells gpart that even if the slice isn’t 4k aligned, the BSD partitions must be.

    I could put a bunch of instructions here, but Warren Block has a nice detailed walk-through of the actual commands used to partition disks with these standards.

    next book(s): FreeBSD storage

    I’m writing about FreeBSD disk and storage management. (The folks on my mailing list already knew this.) For the last few months, I’ve been trying to assimilate and internalize GEOM.

    I’ve always used GEOM in a pretty straightforward: decide what I want to achieve, read a couple man pages, find an archived discussion where someone achieved my goal, blindly copy their commands, and poof! I have deployed an advanced GEOM feature. GEOM was mostly for developers who invented cool new features.

    Turns out that GEOM is for systems administrators. It lets us do all sorts of cool things.

    GEOM is complicated because the world is complicated. It lets you configure your storage any way you like, which is grand. But in general, I’ve approached GEOM like I would any other harmless-looking but deadly thing. Now I’m using a big multi-drive desktop from iX Systems to fearlessly test GEOM to destruction.

    I’m learning a lot. The GEOM book will be quite useful. But it’s taking longer than I thought. Everything else flows out of GEOM. I’ve written some non-GEOM parts, but I’m holding off writing anything built on top of GEOM. Writing without understanding means rewriting, and rewriting leads to fewer books.

    My GEOM comprehension is expanding, and many developers are giving me very good insight into the system. GEOM is an underrated feature, and I think my work will help people understand just how powerful it is and what a good selling point it is for FreeBSD.

    My research has gone as far as the man pages can take me. Now I need to start pestering the mailing lists for answers. Apparently my innocuous questions can blow up mailing lists. I would apologize, but an apology might imply that I won’t do it again.

    FreeBSD storage is a big topic. I suspect it’s going to wind up as three books: one on GEOM and UFS, one on ZFS, and one on networked storage. I wouldn’t be shocked if I can get it into two. I would be very surprised if it takes four. (I’m assuming each book is roughly the size of SSH Mastery — people appear to like that length and price point.) I will adjust book lengths and prices as needed to make them a good value.

    The good thing with releasing multiple books is that you only need buy the ones you need. You need to learn about iSCSI and NFS? Buy the third book. You want everything but ZFS? Skip that one. And so on.

    As I don’t know the final number of books or how they will be designed, I’m not planning an advance purchase program.

    I am planning to release all the books almost simultaneously, or at least very close together.

    So, a mini-FAQ:

  • When will they be released?
    When I’m done writing them.

  • How much will they cost?
    Dunno.

  • How many will there be?
    “Five.” “Three, sir.” Or four. Or two. Definitely a positive integer.

  • Do you know anything?
    I like pie.

    I’m pondering how to give back to FreeBSD on this project.

    I auctioned off the first copy of Absolute FreeBSD to support the FreeBSD Foundation. That raised $600 and was rather fun. These books will be print-on-demand, though, so “first print” is a little more ambiguous. It also has a ceiling, where OpenBSD’s ongoing SSH Mastery sales keep giving.

    I’ve had tentative discussions with Ed Maste over at the FreeBSD Foundation about using those books as fundraisers. I’d let the FF have the books at my cost, and they could include them as rewards for larger donations. A million and ten things could go wrong with that, so it might not work out. If nothing else, shipping stuff is a lot of work, and the FF folks might decide that their time is better spent knocking on big corporate doors than playing PBS. I couldn’t blame them — that’s why I don’t ship paper books.

    If that fails for whatever reason, I’ll sponsor a FreeBSD devsummit or something.

  • virtio NIC on OpenBSD 5.5-current

    My Ansible host is OpenBSD. Because if I’m going to have a host that can manage my network, it needs to be ridiculously secure. The OpenBSD host runs on KVM (through the SolusVM virtualization management system).

    During heavy data transfers, the network card would occasionally stop passing traffic. I could run any Ansible command without issue, but downloading an ISO caused hangs. This was most obvious during upgrades. Downloads would stall. I could restart them with ^Z, then a “ifconfig vio0 down && ifconfig vio0 up && fg” but this still isn’t desirable.

    The vio(4) man page includes the following text:

         Setting flags to 0x02 disables the RingEventIndex feature.  This can be
         tried as a workaround for possible bugs in host implementations or vio at
         the cost of slightly reduced performance.

    (Thanks to Philip Guenther for pointing that out. I would kind of expect this to have been in the BUGS section, or maybe say “Try this if you have weird problems,” but at least the info is there.)

    So: download the new bsd.rd kernel, set the flag, and try to upgrade.

    #config -ef /bsd.rd
    OpenBSD 5.5-current (RAMDISK_CD) #147: Wed May 28 13:56:39 MDT 2014
    deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
    Enter 'help' for information
    ukc> find vio
    146 vio* at virtio* flags 0x0
    ukc> change 146
    146 vio* at virtio* flags 0x0
    change [n] y
    flags [0] ? 2
    146 vio* changed
    146 vio* at virtio* flags 0x2
    ukc> quit
    Saving modified kernel.

    The upgrade now runs flawlessly, and I can no longer reproduce the hangs.

    Be sure to repeat this on the new kernel.