Network collisions running hosts under KVM

I use KVM and OpenNebula on Ubuntu for virtualization. Getting such a cluster up and running is easy, but making it perform well takes much more work.  Many times, the statement “my virtualization cluster works well” is equivalent to “I’m not paying attention.”  My FreeBSD hosts help point out problems, though.  All of my FreeBSD servers send me a daily email to tell me they’re still alive and to point out potential issues.  That’s how I found out I was getting network collisions on my virtualized hosts, and here’s how I investigated them.The daily emails include the output of netstat -i, as so:

# netstat -i
Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
re0    1500 <Link#1>      00:16:8b:ab:c7:16   279288     0     0   224253     0 223971
re0    1500 139.171.199.0 knocker             210442     -     -   223976     -     -
re0    1500 fe80:1::216:8 fe80:1::216:8bff:        0     -     -        2     -     -
lo0   16384 <Link#2>                             206     0     0      206     0     0
lo0   16384 fe80:2::1     fe80:2::1                0     -     -        0     -     -
lo0   16384 localhost     ::1                      0     -     -        6     -     -
lo0   16384 your-net      localhost               47     -     -       47     -     -

Note the output in bold.  Network interface re0 transmitted 224253 packets, and 223971 of them collided.  That’s better than a 99% collision rate.  Ick.  Plus, most of my servers run diskless over NFS.  So every disk I/O operation is delayed by collisions.  Ick ick ick.  So, what to do?

Each physical server in the VM cluster has two NICs.  One is for server management, the other is a bridge for VMs.  Bridges aren’t as good at sorting out traffic as a switch.  I tried Open vSwitch, to see if that would improve things.  Open vSwitch has lots of stuff to recommend it — its netflow features are sufficient to get me to use it — but it didn’t stop the collisions.

Well, maybe it was a virtual hardware issue.  KVM offers five models of network interfaces: e1000 (Intel gigabit Ethernet), ne2k_pci (the classic NE2000), pcnet (LANCE), rtl8139 (RealTek), and virtio (the KVM virtual driver).  I tried all of these.

  • e1000: cannot boot from DHCP
  • ne2k_pci: no collisions, but only 10Mbs, also generates “invalid packet length” messages
  • pcnet: no collisions, but only 10Mbs, also generates “dropping chained buffer” messages
  • rtl8139: works, with collisions
  • virtio: no FreeBSD driver, and driver is not likely to appear

The warning messages from the two 10Mbs NICs indicate that the NIC has dropped packets it considers impossible.  I suspect that a 10Mbs NIC on any modern network might generate similar messages, but as the errors are only intermittent I’m not terribly concerned.

Is this a KVM/Ubuntu issue, or a FreeBSD issue?  I lack the skills to conclusively say.  Web searches for this kind of problem are difficult, as the word “collision” appears in all sorts of diagnostic output.  I did find other people who had noticed this behavior on a variety of guest operating systems, however.  Personally, I’d say that the goal of virtualization is to present an interface that looks exactly like hardware, however.  While the rtl8139 isn’t an awe-inspiring card, the real hardware doesn’t suffer collisions at this rate.  For that reason, it seems that a bug report to the KVM folks would be the way to go.

Having only 10Mbs isn’t horrible — most of these hosts “cruise” at less than 50kbs — but it’s certainly not ideal.  Is gigabit from the rtl8139 better than 10Mbs?  That’s another investigation.

5 Replies to “Network collisions running hosts under KVM”

  1. In the context of a traditional wired half-duplex ethernet, collisions all by themselves (without input/output errors) don’t matter. Throughput does. Contrary to what many people envision, no packets rammed into each other. As part of sending, the sender listens on the net. If there’s someone else sending at the same time, the sender backs off for 50-odd usecs and reattempts. Assuming the NIC actually knows how to count collisions properly (not necessarily a valid assumption), your ~100% collision rate would still result in only about 6% overall degradation off of optimal throughput.

    If I understand things right, the FreeBSD machine is plugged into the Ubunto machine directly? If that’s the case, it would seem like your problem isn’t so much “you’re seeing collisions” but “you’re talking to it at the lowest common denominator (10/half)”. Have there been any attempts to configure/force speed/duplex at either end?

  2. The rtl8139 is a gigE driver; it simulates gigE full duplex.

    The FreeBSD system is a VM bridged over an physical interface on the Ubuntu/KVM box. So, it’s sort of plugged into the Ubuntu box…

  3. Same issue here, freebsd 8.1 and 8.2 under XEN.

    srv# netstat -ni
    Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll
    re0 1500 00:16:3e:f0:8b:6c 42731 0 0 4788 0 4784
    re0 1500 109.*.*.* 109.*.*.* 6010 – – 4784 – –

    Not likely to be a KVM issue. perhaps a problem with the “re” driver in virtualized environments.

Comments are closed.