I woke up today to find a console with:
panic: _mtx_lock_sleep: recursed on non-recursive mutex iscsi-io @ /usr/src/sys/modules/iscsi/initiator/../../../dev/iscsi/initiator/isc_sm.c:324
The initiator is a FreeBSD-current amd64 from 8 May 2011. The iSCSI target is an inexpensive iomega NAS. Other hosts attached to this iSCSI NAS have also had errors, though. The errors clear when I reboot the NAS.
Unfortunately, the FreeBSD box is a diskless system. Dumps aren’t exactly simple. While I heard some rumours about a network dump facility coming soon at the FreeBSD BSDCan devsummit, that’s the future.
How to fix this?
I attended the High Performance FreeBSD Clusters talk at BSDCan 2011. The presenter had originally used FreeBSD servers, then tried OpenSolaris to get better performance. He had OpenSolaris problems, but found that they could not access the bug information without a support contract. They’re now moving towards FreeBSD with EIT, and are happier.
I intend to learn from their mistakes, and replace the iomega with a FreeBSD EIT server. I’ll keep the iomega for, say, a central ports and packages NFS server, where a reboot won’t impact my uptime.
Why bother to blog this? So that the next poor bugger who gets this panic message gets at least one search engine hit.
If you have any questions I can answer, let me know. Also be sure to report back on how well it all works after your migration. I expect to do the same after we finish migrating back to FreeBSD.
Out of curiosity, why EIT and not the istgt port or FreeNAS?
David: because of the before-mentioned talk. Lars didn’t get the performance he’d hoped for out of istgt. And I’m not doing freeNAS because I have some seriously daft ideas, which will require a full FreeBSD install for dev purposes.
Lars: Thanks, I’m sure I’ll be in touch.
It might be the case that the locking issue can be ferreted out by inspection, even without a full dump.
Perhaps there simply needs to be a test to see if it’s already locked before locking.
Even without a lot of data, it may be worth getting into the bug system, simply to have scottl or whoever look at it.
Just random thoughts…