I use KVM and OpenNebula on Ubuntu for virtualization. Getting such a cluster up and running is easy, but making it perform well takes much more work. Many times, the statement “my virtualization cluster works well” is equivalent to “I’m not paying attention.” My FreeBSD hosts help point out problems, though. All of my FreeBSD servers send me a daily email to tell me they’re still alive and to point out potential issues. That’s how I found out I was getting network collisions on my virtualized hosts, and here’s how I investigated them.The daily emails include the output of netstat -i, as so:
# netstat -i Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll re0 1500 <Link#1> 00:16:8b:ab:c7:16 279288 0 0 224253 0 223971 re0 1500 139.171.199.0 knocker 210442 - - 223976 - - re0 1500 fe80:1::216:8 fe80:1::216:8bff: 0 - - 2 - - lo0 16384 <Link#2> 206 0 0 206 0 0 lo0 16384 fe80:2::1 fe80:2::1 0 - - 0 - - lo0 16384 localhost ::1 0 - - 6 - - lo0 16384 your-net localhost 47 - - 47 - -
Note the output in bold. Network interface re0 transmitted 224253 packets, and 223971 of them collided. That’s better than a 99% collision rate. Ick. Plus, most of my servers run diskless over NFS. So every disk I/O operation is delayed by collisions. Ick ick ick. So, what to do?
Each physical server in the VM cluster has two NICs. One is for server management, the other is a bridge for VMs. Bridges aren’t as good at sorting out traffic as a switch. I tried Open vSwitch, to see if that would improve things. Open vSwitch has lots of stuff to recommend it — its netflow features are sufficient to get me to use it — but it didn’t stop the collisions.
Well, maybe it was a virtual hardware issue. KVM offers five models of network interfaces: e1000 (Intel gigabit Ethernet), ne2k_pci (the classic NE2000), pcnet (LANCE), rtl8139 (RealTek), and virtio (the KVM virtual driver). I tried all of these.
- e1000: cannot boot from DHCP
- ne2k_pci: no collisions, but only 10Mbs, also generates “invalid packet length” messages
- pcnet: no collisions, but only 10Mbs, also generates “dropping chained buffer” messages
- rtl8139: works, with collisions
- virtio: no FreeBSD driver, and driver is not likely to appear
The warning messages from the two 10Mbs NICs indicate that the NIC has dropped packets it considers impossible. I suspect that a 10Mbs NIC on any modern network might generate similar messages, but as the errors are only intermittent I’m not terribly concerned.
Is this a KVM/Ubuntu issue, or a FreeBSD issue? I lack the skills to conclusively say. Web searches for this kind of problem are difficult, as the word “collision” appears in all sorts of diagnostic output. I did find other people who had noticed this behavior on a variety of guest operating systems, however. Personally, I’d say that the goal of virtualization is to present an interface that looks exactly like hardware, however. While the rtl8139 isn’t an awe-inspiring card, the real hardware doesn’t suffer collisions at this rate. For that reason, it seems that a bug report to the KVM folks would be the way to go.
Having only 10Mbs isn’t horrible — most of these hosts “cruise” at less than 50kbs — but it’s certainly not ideal. Is gigabit from the rtl8139 better than 10Mbs? That’s another investigation.
In the context of a traditional wired half-duplex ethernet, collisions all by themselves (without input/output errors) don’t matter. Throughput does. Contrary to what many people envision, no packets rammed into each other. As part of sending, the sender listens on the net. If there’s someone else sending at the same time, the sender backs off for 50-odd usecs and reattempts. Assuming the NIC actually knows how to count collisions properly (not necessarily a valid assumption), your ~100% collision rate would still result in only about 6% overall degradation off of optimal throughput.
If I understand things right, the FreeBSD machine is plugged into the Ubunto machine directly? If that’s the case, it would seem like your problem isn’t so much “you’re seeing collisions” but “you’re talking to it at the lowest common denominator (10/half)”. Have there been any attempts to configure/force speed/duplex at either end?
The rtl8139 is a gigE driver; it simulates gigE full duplex.
The FreeBSD system is a VM bridged over an physical interface on the Ubuntu/KVM box. So, it’s sort of plugged into the Ubuntu box…
Same issue here, freebsd 8.1 and 8.2 under XEN.
srv# netstat -ni
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll
re0 1500 00:16:3e:f0:8b:6c 42731 0 0 4788 0 4784
re0 1500 109.*.*.* 109.*.*.* 6010 – – 4784 – –
Not likely to be a KVM issue. perhaps a problem with the “re” driver in virtualized environments.
Check some at http://blog.philippklaus.de/2011/02/install-pfsense-in-kvm-on-ubuntu-10-10/
no network collisons here, but limited bandwidth