Kernel Log: Coming in 3.3 (Part 1) - Networking
by Thorsten Leemhuis
Version 3.3 of the Linux kernel offers another way to team multiple Ethernet devices. Support for "Open vSwitch", a virtual network switch that was specifically developed for virtualised environments, has also been added. Byte Queue Limits are designed to reduce the latencies that cause the much-discussed "buffer bloat".
With the release of Linux 3.3-rc1 last week, Linus Torvalds closed the merge window of version 3.3 of the Linux kernel. From now until this version is published, the developers will mainly integrate bug fixes and minor improvements – so the first release candidate of Linux 3.3 should already include all the major new features of this kernel version, which is expected to arrive in March. Modifications added by the kernel developers in the first phase of the development cycle are rarely removed or disabled in the currently ongoing stabilisation phase.
Therefore the Kernel Log is already in a position to provide a comprehensive overview of the most important new features of Linux 3.3. As usual, this overview will be presented within a "Coming in 3.3" series of articles that will gradually cover the kernel's various functional areas. Part 1 of the series describes the most important changes to the network stack and the LAN and Wi-Fi hardware drivers. Over the coming weeks, further articles will cover the kernel's storage support, filesystems, architecture code, infrastructure and other hardware drivers.
The kernel developers have added an Ethernet teaming driver that combines multiple Ethernet devices into one virtual device (link aggregation/802.1AX). This virtual network device uses a round-robin technique to share network load across multiple ports; alternatively, a designated "active backup" port can take over if there are problems with the primary network connection. The developers say that the driver is a very fast, simple and scalable alternative to the bonding driver that has provided similar functionality for some time. However, the team driver doesn't do all the work itself, instead co-operating with the libteam userspace library.
Linux 3.3 includes the kernel components that are required for the Open vSwitch. This multi-layer virtual switch can operate on layers 2, 3 or 4 and was specially developed for virtualisation environments; it is used, for example, in XenServer 6.0 to control the network traffic between the host, its guests and the outside world. Background information on this technology can be found on the Open vSwitch web site, in the documentation available there, in the kernel documentation, and in the video of a presentation Simon Horman gave at the linux.conf.au 2012 open source conference in mid-January:
In Linux 3.3, the network priority cgroup infrastructure allows administrators to dynamically configure network resource priorities for control group (cgroup) processes; details are provided in the documentation. Another new feature, the "TCP buffer size controller", allows the memory controller to limit the amount of RAM that is available to the buffers used in TCP communication (for example 1, 2, 3, 4, 5); as described in an article on LWN.net that discusses a previous version of the now integrated patch, these buffers can become rather large and cause disruptions on systems where RAM is scarce.
The "Dynamic Queue Limits" introduced by a Google developer, and the "Byte Queue Limits" that are based on them, allow the kernel to control how much data can accumulate in a send queue. This is designed to reduce network latencies that occasionally occur due to excessive buffer use in modern network chips without impacting data throughput; the aim is to reduce the "buffer bloat" problem that was mentioned in a previous Kernel Log and is caused by excessive data buffering in networking hardware. The developers have modified the most important network drivers – including bnx2, bnx2x, forcedeth, e1000e, igb, niu, tg3, sfc and sky2 – so that they support Byte Queue Limits. The article "Network transmit queue limits" published on LWN.net in August 2011 offers some background on this technology.