Moving Virtual Machines to Jails

I recently learned that I could rent a dedicated machine from bloom.host for less than I’ve been paying for my virtual machines. Time to move some VMs to jails! Here’s the notes I’ve left for myself. All of my VMs run ZFS.

First, clean up unneeded boot environments, remove any unnecessary crap that lingered on the VM, apply all security updates, and in general tidy up the source VM.

Then decide how you want to flip services over. The cleanest way is to shut down all services and start the migration, but you might need to guarantee uptime. It’s up to you. I chose to leave services running during an initial replication, shut down services, do an final snapshot with an incremental replication, start the new jail, and change DNS to the new addresses. Figure out your own uptime requirements.

Start by creating a recursive snapshot of the system.

# zfs snapshot -r zroot@bloom

At a convenient time, I’d go to destination host and pull the snapshots over. The snapshots need to go into a directory on the zroot/jails dataset, named after the VM the jail will replace.

$ ssh mwlucas@www.mwl.io zfs send -Rc zroot@bloom | zfs recv -v -o mountpoint=/www zroot/jails/www

This might take a while, so follow up with an incremental right before you want the actual the migration.

$ ssh mwlucas@www.mwl.io zfs send -Rci zroot@bloom2 zroot@bloom3 | zfs recv -v -o mountpoint=/jails/mail zroot/jails/www

if you’ve tampered with new datasets between copies, you’ll get an error.

receiving incremental stream of www/ROOT@bloom3 into zroot/jails/www/ROOT@bloom3
cannot receive incremental stream: destination zroot/jails/www/ROOT has been modified
since most recent snapshot
warning: cannot send 'www/ROOT/default@bloom3': signal received
Broken pipe

Roll back the problem dataset.

# zfs rollback zroot/jails/mail/ROOT@bloom2

Data’s moved over, but there’s trouble.

$ zfs list
...
zroot/www 39.6G 776G 132K /www
zroot/www/ROOT 22.5G 776G 132K /www/ROOT
zroot/www/ROOT/default 22.5G 776G 21.8G /www/ROOT/default
zroot/www/usr 10.9G 776G 132K /www/usr
zroot/www/usr/home 9.37G 776G 384K /www/usr/home
zroot/www/usr/home/acme 7.10M 776G 7.10M /www/usr/home/acme ...

The jail boots from the boot environment /www/ROOT/default, but the jail’s root dataset is /zroot/www. It’s empty. Shuffling datasets and rearranging inheritance is a pain. I just duplicated the contents

# zfs mount zroot/jails/mail/ROOT/default

$ tar cfC - /jails/www/ROOT/default/ . | tar xvpfC - /jails/www/

# zfs list zroot/www
NAME USED AVAIL REFER MOUNTPOINT
zroot/www 41.4G 774G 132K /www
zroot/www/ROOT 22.5G 774G 132K /www/ROOT
zroot/www/usr 10.9G 774G 132K /www/usr
zroot/www/var 7.96G 774G 132K /www/var

Go into the jail’s root directory. Edit /etc/sysctl.conf to remove non-jail settings. You can also edit rc.conf for the new network interface and the new IP.

I’m using VNET, because otherwise I must configure on-system daemons to avoid binding to localhost. (Remember, in a non-VNET jail localhost is aliased to the public IP!) That means I need a bridge interface. This host has one live Ethernet, igb0 so I make it a bridge.

autobridge_interfaces="bridge0"
autobridge_bridge0="igb*"
cloned_interfaces="bridge0"
ifconfig_igb0="UP"

I then add a public IP to the bridge, for the host’s use.

Now for jail.conf for a VNET install. I need to allow devfs for running named(8) on some of the VMs, and I want raw sockets.

path = "/jails/$name";
mount.devfs;
devfs_ruleset=5;
exec.clean;
allow.mount.devfs=1;
allow.raw_sockets=1;

exec.consolelog="/jails/$name/var/log/console.log";

vnet;
exec.prestart += "/sbin/ifconfig epair${jid} create up";
exec.prestart += "/sbin/ifconfig epair${jid}a descr 'vnet-${name}'";
exec.prestart += "/sbin/ifconfig bridge0 addm epair${jid}a up";
vnet.interface="epair${jid}b";

exec.start = "sh /etc/rc";

exec.created="logger jail $name has started";

exec.stop = "sh /etc/rc.shutdown";
exec.poststop += "ifconfig epair${jid}a destroy";
exec.poststop +="logger jail $name has stopped";

.include "/etc/jail.conf.d/*.conf";

This reduces individual jail.conf entries to this.


www {
jid = 80 ;
}

At this point, I could start the jail and see what broke. Some common errors included /tmp losing the sticky bit and MariaDB directories being owned by root rather than mysql.

Change the DNS, and watch traffic shift to the new host.

Am I confident in this process? No. That’s why I make sure I have a last backup in Tarsnap, and wait 30 days to delete the source VM.

OpenBSD PF versus FreeBSD PF

I encountered yet another discussion about OpenBSD PF versus FreeBSD PF. For those who are new to the discussion: OpenBSD developers created PF in 2001, and it rapidly improved to become the most approachable open source packet filter. FreeBSD ported PF over to its kernel in 2004, with occasional updates since. Today a whole bunch of folks who don’t program echo cultish wisdom that one or the other version of PF has fallen behind, not kept up on improvements, or otherwise betrayed their community. My subtler comments have been misinterpreted, so let’s try this.

These claims are garbage.

First, and most importantly: FreeBSD PF developers work with OpenBSD devs all the time, and OpenBSD PF developers pull stuff from FreeBSD1. You get a lot of noise about certain people being jerks about the other project–and both projects absolutely have jerks. (And yes, anyone who has read my books knows that I am a cross-platform jerk.) But for the most part, folks want to work together.

PF is absolutely an OpenBSD creation, though, so why isn’t the OpenBSD version the Single Source of Truth? Why doesn’t FreeBSD just consider OpenBSD a vendor and pull that code in? Because the OpenBSD and FreeBSD kernels are wholly different.

Back when I wrote Absolute BSD, I could realistically write a single book that would basically apply to the three major open source BSDs. Yes, the various projects objected to being lumped together, but if you knew any one of them you could stumble through the others. This is no longer true. FreeBSD’s kernel uses a wholly different locking model than OpenBSD. OpenBSD’s memory protections have no equivalent in FreeBSD. These are not things you can manage with a shim layer or kernel ABI. These are big, complicated, intrusive differences. You can’t tar up one version and dump it in the other’s kernel. It won’t work. If you do a hack job of making it work, it will perform badly.

Yes, you can find “proof” that one PF or the other is faster under particular workloads on specific hardware. I have no doubt that some of them are not only accurate, but honest. There are other workloads, though, and other hardware, and other conditions. Regardless of who wins a particular race, the constant competition to achieve peak performance benefits everyone. I’m not going to link to any of the benchmarks, because I have made my opinions on benchmarking very clear elsewhere.2 Pick what you want and roll with it.

Every PF developer is trundling along, doing their best to make things work.

Are features missing from one or the other? Yep. I’m not going to list examples because, as the above links show, each project plucks what they find useful from the other. These things are freely given, with glad hearts, but they take time to integrate. Filling message boards with staunch declarations that my team’s PF is better is not only tedious, it wholly misses the point.

People are working together to improve the world.

And the PF syntax is the most approachable in all of open source Unix.

(Partisan fanboy comments will be mercilessly whacked.)

First BSDCan Operations Team Meeting

I’m posting this here because I’m posting it everywhere.

I’ve just sent an email everyone who volunteered to help make BSDCan 2024 happen. I suspect some of you have not received that email. If you haven’t seen it, please check your spam folder. We need to start organizing now to make 2024 go smoothly. Mostly smoothly.

If you’re on the operations team and didn’t get the email, please poke me directly so I can update your email address.

Tomorrow night: mug.org talk on OpenBSD Filesystems

I’ll be doing my talk about OpenBSD filesystems tomorrow night, for mug.org‘s online meeting.

I expect this will be the last time I give this talk, but MUG does a decent job of recording so you can catch it later on YouTube or wherever. But if you show up in person, you can ask inconvenient questions like “when are you going to learn to write?” or “have you considered truck driving school?”

BSDCan 2024 Reorganization

Dan Langille has spent a good part of the last twenty years on BSDCan. We’ve had other BSD conferences in the Western Hemisphere, but BSDCan is the most consistent. Covid interrupted it, but only because Dan coordinated with EuroBSDCon to have a single online conference in 2021.

This is a lot of work. Dan’s life has changed.

Dan is stepping back from organizing BSDCan. I am taking over coordinating 2024.

Note I did not say “running.” Running an international conference is a job best accomplished by a team. A large team. Dan set up BSDCan 2023 with himself and Adam Thompson, and ran it with assistance from Dru Lavigne and Warren Block in registration, and Patrick McEvoy and Andrew Fengler in streaming. I am not nearly that tough.

Instead, we have assembled a team of seventeen people to be the BSDCan 2024 Operations Team. Dan will become our source of knowledge, telling us who to talk to at the University of Ottawa and where to reliably get T-shirts locally and which caterers are most likely to cause indigestion. I am pleased that Adam, Dru, Warren, Patrick, and Andrew all cheerfully agreed to continue in their roles. (Adam’s work of coordinating travel and accommodations for the speakers will be split among a few folks, led by Adam.)

My entire job will be to coordinate the team, help them gather resources, and mediate conflicts between them. Every person on this list is motivated to make BSDCan 2024 happen, but even the best-intentioned folks can disagree. If required I’ll make final decisions, sure, but if my decision makes people unhappy, I have no doubt that the esteemed program committee will tell me I’m being an idiot. They have final say over the con, after all.

This means I need to be staunch on not doing anything myself–although I admit, I might make a call to the Diefenbunker to see how much it would cost to host the closing social event there, and to Pili Pili Chicken for a price quote to cater it. This is mostly for my peace of mind, however. Confirming it’s too expensive will put the idea to rest.

Organizing BSDCan with two folks is a monumental achievement. I have seventeen, which I consider barely adequate for a Redundant Array of Independent Dans. We’ll need folks to handle a variety of smaller tasks, from checking video times to hauling boxes. (If you haven’t hauled a box for the con staff, have you really been to BSDCan?) I intend to make use of the bsdcan-volunteers mailing list to gather those folks. Folks on our team can ask for specific help on that list, whether it be figuring out a balky database or showing up at 7AM opening day with a roll of duct tape and a cattle prod. (“Cattle prod? Really?” Hey, I don’t know the details. Knowing the details would get in the way of me doing my job. I trust the various team members to know what they need and to ask for it.)

One thing that the BSD community has historically excelled at is passing leadership from one generation to the next. BSDCan’s operations team will follow that example. We have old hands taking the lead on parts of BSDCan, but we have at least two people covering each responsibility and at least one of them should be comparatively young.

I don’t intend to coordinate BSDCan for more than a couple years. My goal is to set up a self-perpetuating structure, make sure it runs, and walk away. Normally, I wouldn’t take on anything like this, but BSDCan is important to my people and it deserves my time and attention.

The BSDCan 2024 goal is to largely reimplement BSDCan 2023, but supported by different people. Yes, there’s room to change things. Yes, I have ideas. Many people have ideas. You probably have ideas. Our overwhelming goal is to make the conference happen. Perhaps that’s unambitious but extracting knowledge from Dan’s head, documenting it, and reimplementing it will take time and energy. I don’t want to burn anyone out. I intend to retain the location, the mask policy, the papers committee, the social event, everything, until we have BSDCan down cold.

Dan has done well. He’s earned a break.

We are drafting him to run the auction, however.

BSDCan 2023 Tutorial: OpenBSD Storage Management

I’ll be teaching a four-hour tutorial on OpenBSD storage management at BSDCan 2023. As you might imagine, it’s based on OpenBSD Mastery: Filesystems.

I am pleased to see BSDCan returning to meatspace. Yes, the pandemic is ongoing, and I don’t blame folks who decide not to attend. The main reason I chose to attend is that the concom (which I’m a member of) has chosen to enforce a stringent mask policy. Yes, I know you have the right to not wear a mask in public. BSDCan is a private event for a community, however, and communities have a responsibility to protect their weakest members. (People who think that your rights outweigh your responsibilities, I will delete your comments.)

I hope to see many of you at the con, if not my tutorial. It will be good to see many old friends. Well, at least their eyes. Faces in one of the many fine outdoor dining establishments Byward Market offers.

Jan/Feb 2023 Column Out in the FreeBSD Journal

Somehow, I’ve written 28 “We Get Letters” columns for the FreeBSD Journal. The latest is out. I’m amazed that they haven’t given me the boot yet, especially as I’m attempting to channel Michael Bywater’s brilliant “Bargepole” column from the unforgivably murdered weekly “Punch.” At best I’ve achieved a tenth of Bargepole’s vitriolic geezerliciousness, but seeing as Bargepole contained enough vitriol to kill a beluga whale I expected them to ban me years ago.

You can get all the columns for free by trawling through the old issues, but you could also grab the collection and save yourself three years of trouble.

“OpenBSD Mastery: Filesystems” Print Status

I got the print proofs of OpenBSD Mastery: Filesystems yesterday, and they came out lovely.

I’ve gone through them and fixed some less-than-optimal layout decisions. You never know what a print book will look like until you hold it in your hands. Several folks read the ebook already and sent some corrections that I, the tech editors, and the copyeditor missed, so I fixed them. The ebook has been updated to match. The printer now has the print files.

So, when will the print book be in stores?

That depends on the printer. I expect they’ll approve the print in the next few days, unless they pull another Orcibus and sit on it for a month. After a week or so, though, I have nothing better to do than give them pain every. single. day.

I will release the print book to the public when I can. I’ll be ordering the pre-orders directly from the printer at the same time.