panic: _mtx_lock_sleep: recursed on non-recursive mutex iscsi-io @ /usr/src/sys/modules/iscsi/initiator/../../../dev/iscsi/initiator/isc_sm.c:324
The initiator is a FreeBSD-current amd64 from 8 May 2011. The iSCSI target is an inexpensive iomega NAS. Other hosts attached to this iSCSI NAS have also had errors, though. The errors clear when I reboot the NAS.
Unfortunately, the FreeBSD box is a diskless system. Dumps aren’t exactly simple. While I heard some rumours about a network dump facility coming soon at the FreeBSD BSDCan devsummit, that’s the future.
How to fix this?
I attended the High Performance FreeBSD Clusters talk at BSDCan 2011. The presenter had originally used FreeBSD servers, then tried OpenSolaris to get better performance. He had OpenSolaris problems, but found that they could not access the bug information without a support contract. They’re now moving towards FreeBSD with EIT, and are happier.
I intend to learn from their mistakes, and replace the iomega with a FreeBSD EIT server. I’ll keep the iomega for, say, a central ports and packages NFS server, where a reboot won’t impact my uptime.
Why bother to blog this? So that the next poor bugger who gets this panic message gets at least one search engine hit.
BSDCan 2011 was great. The problem with a conference that’s routinely great is that great becomes routine, and hence boring. Several presentations struck me as notably interesting for a variety of reasons, and I wanted to comment on three of them. These are only my personal opinions, of course. BSDCan had three tracks, and I could only be in one talk at a time.
Mark Linimon’s talk on How not to build a lights-out facility discussed the FreeBSD Project’s efforts to mirror its core infrastructure in datacenter space donated by New York Internet. As a chronicle of lessons learned and things that should be done differently next time, it’s valuable listening for anyone who thinks that building heavy-duty project infrastructure is easy.
I’m not going to name the people, the projects, or the code involved in the second talk, because the talk itself is less important than what happened during it. A committer from one large BSD project presented on a new piece of infrastructure he had developed. The audience included people associated with a variety of BSD projects. At the end of the talk, a senior developer from a different BSD project asked a few questions. The presenter and the developer had several rounds of completely civil back-and-forth technical discussion, and at the end the presenter agreed that the developer had some strong points and that some parts of his infrastructure needed additional work. I’m told that this happened in more than one talk. Despite discussion of disagreements between various BSDs projects, it’s clear that technical correctness is still most important.
The presentation I found most technically interesting was Randall Stewart’s work in data center congestion control. Stewart did real-world testing of data center congestion control with ECN and SCTP, and presented his results. It wasn’t until hours later that I realized exactly why I found the talk so interesting: he had essentially done “Mythbusters” for a specific part of TCP/IP. He’d bought a bunch of $50 servers on eBay, repeatedly adjusted SCTP’s response to packets with ECN set, and graphed the results. This was real-world stuff suspiciously close to academic research, done in a basement. And this sort of research is something that almost anyone could do. Lots of claims are made for our network stacks, but very few people actually experiment to measure performance with their workloads.
I’m glad to see open source projects learning lessons. I’m glad to see different BSD camps politely testing their ideas against each other, creating better software for everyone. But I’m really really happy to see real-world experiments.
I see all sorts of claims for different BSD’s network stacks, disk performance, and so on. Please, put them to the test. Make changes. Measure the results. While this work requires real hardware rather than virtualization, it’s something that anyone can do. You know your workload. Read about benchmarking. While naive benchmarks aren’t useful, it’s not that hard to design valid benchmarks. Buy used hardware, run your own tests. Make changes, and test again. Measure and document everything. Capture packets, and keep the pcap files so that you can go back and answer interesting questions. Publish your results. You’ll get interest. Perhaps your results will be as you expect. Maybe they won’t. But you’ll never know until you try.
As a BSDCan committee member, I would love to see more work like this. I can’t guarantee that your paper would be accepted, but I can say I’m much more likely to vote for a paper with a real investigation than yet another talk on well-understood features. Even if your results say “Yes, the fooBSD disk I/O system works exactly as expected,” it’s still interesting. And if you discover weak spots, you’ll have evidence the developers will need to improve performance.
I need to confine the user jrlodden to his home directory on this OpenBSD 4.9/i386 system, but give him a shell prompt and access to a couple of specific commands. While the SFTP server has built-in chroot support, a shell environment is more complicated. The /etc/ssh/sshd_config part is pretty simple…
...
#ChrootDirectory none
...
Match User jrlodden
ChrootDirectory %h
This chroot directory is nonfunctional. I must create device nodes and add necessary programs. Start by creating the the user account with adduser(8), creating standard device nodes, and removing unnecessary nodes.
# cd ~jrlodden
# mkdir dev
# cd dev
# /dev/MAKEDEV std
# ls
arandom klog ksyms null stdin tty zero
console kmem mem stderr stdout xf86
# rm console klog kmem ksyms mem xf86
# ls
arandom null stderr stdin stdout tty zero
He’ll need a statically-linked shell, such as /bin/ksh.
# cd ~jrlodden
# mkdir bin
# cd bin/
# file /bin/ksh
/bin/ksh: ELF 32-bit LSB executable, Intel 80386, version 1, for OpenBSD, statically linked, stripped
# cp /bin/ksh .
A chrooted user should not have write access to his own root directory. He will need a home directory in the chroot, however.
# ssh jrlodden@chroothost
ksh: No controlling tty (open /dev/tty: Device not configured)
ksh: warning: won't have full job control
$
jrlodden is logged in and cannot access anything beyond his cell. While I’d like to clean up the /dev/tty warning, I can’t seem to create /dev/ttypc in the chroot’ed /dev. For now, I can copy statically-linked versions of his necessary programs into /home/jrlodden/bin and get on with my life.
I promised I’d announce the title of my next No Starch Press book in my BSDCan talk. That happened. The rest of you had to wait until now to hear that I’m rewriting Absolute OpenBSD. The technical reviewer is Peter Hansteen, author of The Book of PF.
Most of the book does not exist yet. Best guess for a release date is some time in 2012.
Why did a second edition take so long?
I will only write books about tools I use in production, out in the real world. (Desktop use does not count.) In my previous job, senior network engineer at a global automotive supplier, I had no opportunity to use OpenBSD. That meant I couldn’t offer advice about using OpenBSD, or discuss how it fit into my infrastructure. I could have written the book, but it would have sucked.
I’m also working on a second nonfiction project, but I’ll announce that separately.
The ports team has developed new package management tools and methods to simplify FreeBSD package management. The hope is to have these as the default in FreeBSD 10. Erwin Lansing has posted slides from his brief presentation, and a Web search for “pkgng FreeBSD” will get you all sorts of details.
BSDCan! Are you going? Why not? Sorry, that excuse isn’t good enough. Get there. I arrive Tuesday. I will be looking for you. Do not make me come looking.
As a result of BSDCan, as well as preparing to sell my house, various stuff has been delayed. If you’re waiting on me, I’ll get to you soon. Really.
The good news is, the house painting is finished. All that remains is to pack. While not fun, packing can be done in smaller chunks of time than painting. Hoping to get book writing back on track as a result.
For those who missed the announcement, FreeBSD 9 has replaced sysinstall with a new installer. It’s based on the PC-BSD installer’s back end and a text-based front end by Nathan Whitehorn. So, what is this new installer we’ve waited sixteen years for? I decided to find out.
As an official snapshot with the BSD installer isn’t yet available, I grabbed one of nwhitehorn’s recent snapshots. The back end is not integrated, but let’s see what the front end looks like. I assume there will be changes between this February snapshot and the next official snapshot. For my testing, I used a VM running Microsoft Virtual PC (for ease of screenshots) with ACPI disabled (so the install would finish).
Booting the image, I saw:
A recovery shell and a live CD are great, but I want to install.
The next screen tells me to select a keyboard. By virtue of alphabetical order, the default is Armenian. When I arrow down to find the US keyboard, there’s several different options. I know exactly what kind of keyboard I have, it’s an Endurapro with the mouse pointer between the G and the H. But that’s not an option. Fortunately, I use a Dvorak layout, and that is an option.
I’m now asked for a hostname, and then allowed to choose optional components.
Of course I want source code!
I will miss the ability to choose “the smallest distribution possible,” but in reality, upgrading added the entire base system anyway.
Now we partition the disk.
Ooooh, a “guided” method. That sounds newbie-friendly. Let’s try that.
We want to use the whole disk.
Wait a minute… where are /var, /tmp, /usr, and all the other partitions? Don’t tell me that one of the last survivors of proper partitioning is stumbling into the one large root filesystem trap? Ick. (I assume that the other partitioning methods will let you create proper partitions.)
Note that these are GPT partitions, not MBR partitions (slices).
Highlighting a partition and pressing ENTER shows details.
We can set a label in the installer, which is a nice feature. And it appears that two ESC takes you out of this submenu into the main menu. That seems odd, but perhaps it’s related to this being in Virtual PC. Arrow over to EXIT and hit ENTER to get the final confirmation screen.
“Are you sure?” Save and Abort aren’t exactly answers to that question – actually “save” is kind of ambiguous in this context. Save your existing data? Yes, we’re sure, choose Save. The filesystem is initialized:
The installer verifies the files on disk and begins installing.
We must watch for a panic in the background, versus a simple lock order reversal. This is -current, either one can happen and both are displayed by default.
While watching this, I wanted to hit ALT-F2 and see just how well the ports were extracting. That’s apparently no longer an option.
After extracting everything, I was prompted for the root password, then for a network interface.
I took the default.
Why yes, I would like DHCP with my host, thank you.
Now I’m asked for the services I want enabled at boot.
Compared to sysinstall there aren’t many choices, but I’m all right with being offered only the lowest common denominators. Far more vexing is how the network configuration gracelessly handled the unplugged network interface. While CTRL-L redrew the screen, a new link down message reappeared every second or so. I’d never tried Virtual PC before, and apparently I need to decipher the network configuration.
Once I choose the essential services, I’m asked if I want to add users. When I say yes, it drops me to text mode for the usual adduser(8) dialog.
All right, the network message is getting pretty annoying now.
After adding my user, I get the final screen.
You can change many of your optional settings, plus set the time zone. You get the usual warnings about removing the CD-ROM from the drive, and then reboot into the new install.
Overall, I like many of the changes. It doesn’t have many of the libdialog annoyances that plague sysinstall. I dislike the guided partitioning system, or more exactly the lack of actual partitioning therein. I do not want /tmp files filling my hard disk!
The good news is, with the separation between the front end and the back end, changing the installer will be much simpler than it ever was with sysinstall. For all the petty annoyances in this installer, it’s a big step in the right direction. I want to commend the creators for actually pushing Sysinstall Replacement Attempt #82,319 to completion.
I installed Apache on a diskless FreeBSD-9/amd64 server. Once I added SSL, the web server wouldn’t start. It died with:
[Mon Mar 21 15:37:16 2011] [emerg] (13)Permission denied: couldn't grab the accept mutex
[Mon Mar 21 15:37:16 2011] [emerg] (13)Permission denied: couldn't grab the accept mutex
[Mon Mar 21 15:37:16 2011] [emerg] (13)Permission denied: couldn't grab the accept mutex
[Mon Mar 21 15:37:16 2011] [emerg] (13)Permission denied: couldn't grab the accept mutex
[Mon Mar 21 15:37:16 2011] [emerg] (13)Permission denied: couldn't grab the accept mutex
[Mon Mar 21 15:37:17 2011] [alert] Child 1733 returned a Fatal error... Apache is exiting!
Google says that the fix for this is to define AcceptMutex flock somewhere in httpd.conf. Google shows dozens of mailing list discussions that give this advice, but without explanation. Every so often, someone whinges about it not working, and doesn’t get a response.
This is an example of what I call “Occult IT.” There are certain recipes that people just follow, without understanding why. We stumble around, grasping for invocations and incantations that will fix our problem. I don’t just want the ritual that will solve the problem; I want to know why the problem happened and deepen my understanding of the issue.
Besides, adding AcceptMutex flock didn’t work.
To identify the problem, I set LogLevel to debug in httpd.conf and tried starting httpd again.
I didn’t have that configuration setting in my httpd.conf, but it’s the Apache default anyway. So, what are my other choices?
Apache documentation says that the AcceptMutex setting determines how Apache serializes incoming requests. The flock setting dictates how Apache locks the LockFile.
My configuration doesn’t have a defined LockFile, so choosing how we lock it isn’t going to help.
I don’t like lock files. Bad stuff happens to them. Let’s try a locking method without a lockfile. The documentation lists two different classes of locking mechanisms. flock and fcntl work on lock files. posixsem, pthread, and sysvsem use the in-memory mechanisms of semaphores and/or mutexes to provide locking.
As I don’t like lock files, I’ll try one of the in-memory mechanisms.
AcceptMutex posixsem
And Apache starts and runs perfectly.
I can’t find any details on the differences between these in-memory mechanisms, from a system administrator’s point of view. I imagine that the System V mechanism wouldn’t work if you’d removed that support from your kernel. But the point is:
Yes, The Great Committer was an April Fools’ gag. I didn’t expect anyone to actually believe it, but it gathered a few laughs, so I’m content.
Far more successful was the FreeBSD and NetBSD to merge gag that Dan Langille and I pulled back in 2003. The United Nations Fretbsd.org site has long since expired, but Dan archived it. (Eight years later, and I still giggle madly about the moose in that one.) That one actually sucked in a reporter from an reputable tech site, who pestered Dan and I for details all day.
John Baldwin really did surpass PHK’s commit total, however, and deserves credit for working so hard on FreeBSD. I heard from John on Friday, and while I’m not going to be so gauche as to copy his email here, I can honestly say that I am relieved that I did not displease The Great Committer.
I’m pleased to be the first to announce that the FreeBSD Project is reorganizing. This will appear on the FreeBSD home page next week, but journalistic ethics decree that I must act when I get a scoop.
FreeBSD has always been something of a meritocracy, where the respect given to committers is directly proportional to the amount and quality of code they commit. This is tracked in the commit statistics. For many years, Poul-Henning Kamp (phk@) was the undisputed leader, but in 2010 his total was exceeded by relative newcomer John Baldwin (jhb@). Baldwin’s commits show no sign of slowing, and if he continues at this pace, he will soon overtake the combined totals of Foundation President Robert Watson and embedded emeritus Warner Losh.
The FreeBSD Project has a variety of leadership roles and teams, such as the Core Team, the Ports Management team portmgr, and so on. These teams have helped FreeBSD’s broad developer base coordinate their efforts, work together, and streamline FreeBSD’s internal processes.
Effective immediately, all such teams are disbanded.
John Baldwin is to be henceforth known as The Great Committer. He alone will dictate the FreeBSD Project’s direction and where its resources will be allocated. All commits will be credited to The Great Committer, as all commits will touch his work in some way and hence would not be possible without His knowledge, experience, and all-around wisdom.
“It’s clear that our current model just doesn’t work,” said Robert Watson. “In the sixteen years the FreeBSD Project has existed, and the decades BSD had before it, we’ve had at least four distinct generations of leadership. We’ve developed processes for mentoring and grooming our own leaders, letting people move on when they wanted. Frankly, it’s a lot of work, and I don’t know how many more decades we could keep that up. The Great Committer won’t put up with that sort of churn, I’m sure. Once you’re a FreeBSD committer, you’re in for life. However long that is.”
“I’m really glad that The Great Committer has taken this step,” said Wilko Bulte, FreeBSD core team member. “People think that the FreeBSD Core Team made all these high-level decisions, when really we broke up fights and inducted new members. All that email and discussion took a lot of time. Now that The Great Committer has condescended to claim his rightful place, we can get back to doing what’s important — namely, doing everything we can to increase the already bounteous glory of The Great Committer.”
Security Officer Colin Percival said, “We’ve worked tirelessly to ensure FreeBSD’s security. I don’t even know how many man-months I’ve spent auditing code and investigating reports, let alone other members of the FreeBSD Security Team. But now, The Great Committer has decreed that FreeBSD is secure. All praise to The Great Committer!”
The Public FreeBSD Developer Track at BSDCan 2011 will now alternate between technical talks and praising The Great Committer. In addition to learning about exciting features such as Capcisum, attendees will practice making index-finger-and-pinky “Beastie Horns” to salute The Great Committer. Attendees will also witness the unveiling of the official The Great Committer banners, forty feet tall and hand-painted on silk.
No active OpenBSD developers could be bothered to comment, but Jason Dixon, OpenBSD slacker, stated “It’s about time the FreeBSD bums realized that they needed a benevolent despot. Maybe they’ll do something useful now.” A phone call to the NetBSD Foundation got a recording that the Board of Directors could be reached at the pub.
The FreeBSD Foundation is raising funds for a 200-foot golden statue of The Great Committer, to be erected near the University of California Berkeley campus. Its stern visage will remind the university denizens of their second-greatest three-letter claim to fame. The Great Committer has declared that there are no rumors that the statue’s eyes will be lasers that will automatically target users of lesser operating systems.
Meanwhile, The Great Committer has ordered that someone bring him a sandwich. And a beer. And Bill Gates’ lunch money.
You can contribute to the Foundation through DonateNow or in person at BSDCan. There’s no guarantee that the Foundation will use your particular donation to fund the statue, but they will assuredly use it for FreeBSD.
NOON UPDATE: It seems that The Great Committer has declared that HAMMER will be the new default filesystem.