I use KVM and OpenNebula on Ubuntu for virtualization. Getting such a cluster up and running is easy, but making it perform well takes much more work. Many times, the statement “my virtualization cluster works well” is equivalent to “I’m not paying attention.” My FreeBSD hosts help point out problems, though. All of my FreeBSD servers send me a daily email to tell me they’re still alive and to point out potential issues. That’s how I found out I was getting network collisions on my virtualized hosts, and here’s how I investigated them.The daily emails include the output of netstat -i, as so:
# netstat -i Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll re0 1500 <Link#1> 00:16:8b:ab:c7:16 279288 0 0 224253 0 223971 re0 1500 220.127.116.11 knocker 210442 - - 223976 - - re0 1500 fe80:1::216:8 fe80:1::216:8bff: 0 - - 2 - - lo0 16384 <Link#2> 206 0 0 206 0 0 lo0 16384 fe80:2::1 fe80:2::1 0 - - 0 - - lo0 16384 localhost ::1 0 - - 6 - - lo0 16384 your-net localhost 47 - - 47 - -
Note the output in bold. Network interface re0 transmitted 224253 packets, and 223971 of them collided. That’s better than a 99% collision rate. Ick. Plus, most of my servers run diskless over NFS. So every disk I/O operation is delayed by collisions. Ick ick ick. So, what to do?
Each physical server in the VM cluster has two NICs. One is for server management, the other is a bridge for VMs. Bridges aren’t as good at sorting out traffic as a switch. I tried Open vSwitch, to see if that would improve things. Open vSwitch has lots of stuff to recommend it — its netflow features are sufficient to get me to use it — but it didn’t stop the collisions.
Well, maybe it was a virtual hardware issue. KVM offers five models of network interfaces: e1000 (Intel gigabit Ethernet), ne2k_pci (the classic NE2000), pcnet (LANCE), rtl8139 (RealTek), and virtio (the KVM virtual driver). I tried all of these.
- e1000: cannot boot from DHCP
- ne2k_pci: no collisions, but only 10Mbs, also generates “invalid packet length” messages
- pcnet: no collisions, but only 10Mbs, also generates “dropping chained buffer” messages
- rtl8139: works, with collisions
- virtio: no FreeBSD driver, and driver is not likely to appear
The warning messages from the two 10Mbs NICs indicate that the NIC has dropped packets it considers impossible. I suspect that a 10Mbs NIC on any modern network might generate similar messages, but as the errors are only intermittent I’m not terribly concerned.
Is this a KVM/Ubuntu issue, or a FreeBSD issue? I lack the skills to conclusively say. Web searches for this kind of problem are difficult, as the word “collision” appears in all sorts of diagnostic output. I did find other people who had noticed this behavior on a variety of guest operating systems, however. Personally, I’d say that the goal of virtualization is to present an interface that looks exactly like hardware, however. While the rtl8139 isn’t an awe-inspiring card, the real hardware doesn’t suffer collisions at this rate. For that reason, it seems that a bug report to the KVM folks would be the way to go.
Having only 10Mbs isn’t horrible — most of these hosts “cruise” at less than 50kbs — but it’s certainly not ideal. Is gigabit from the rtl8139 better than 10Mbs? That’s another investigation.Stalk me on social media