As virtual experts head to the mecca of virtualization conferences next week, VMworld, we wanted to talk more in-depth about how monitoring and analysis of the network changes in the virtual world. As more companies virtualize their data center(s), problems can arise because the architecture of virtualization is vastly different than that of the physical environment. While this generates challenges for network monitoring, there are already tools and techniques to conquer the problem, helping to ease the pain of implementation.
On the surface, virtual networks work in very much the same way as physical networks. Features like promiscuous mode, NIC teaming and load balancing still exist in the virtual work, and switches and network interface cards still exist as virtual switches (vSwitches) and virtual NICs (vNICs).
Looking deeper, virtual networks have a unique element: vSwitches are typically controlled through the server virtualization tools, meaning that most vSwitches are controlled by server admins rather than network admins . While VMware and others have tried to make vNetworks easy to configure, the reality is that vSwitches have to interface with physical switches. Misconfigurations between the virtual and physical sides are more likely given that the devices are configured by two different teams. These misconfigurations can cause serious problems like network loops, or can remain hidden until exposed by an event like VM migration from one host to another. Hopefully, the network team can work with the server team to implement monitoring before one of these problems occurs.
Fortunately, the semi-exotic technology of virtual networking does not require equally exotic monitoring tools. All in all the practice of monitoring your network and the tools used to monitor that network have not had to drastically change their architecture. The practice of using a flow-based tool when you are trying to solve a high-level network problem like bottleneck identification is still a great choice, and for the more granular issues, deep-packet analysis is still the best option.
Similar to a physical network infrastructure, you would use a flow-based or packet-based solution to collect data across multiple points for the infrastructure and then analyze it with a network performance management and monitoring tool. The major difference here is how you gain visibility into the traffic.
The most common complaint about a vSwitch in network monitoring is that hardware-based monitoring equipment can’t see “inside” the vSwitch. Given the multi-level architecture of modern network applications, the so-called East-West traffic between servers may exist entirely within a single VM host. Fortunately, just like physical switches, vSwitches support port mirroring or spanning. It’s relatively simple to deploy a lightweight VM guest running network monitoring or packet capture software to capture packets by adding a second vNIC connected to a port mirror. The software in the VM guest will capture all traffic across the vSwitch in a manner very familiar to network admins.
Things appear to get a little more tricky when using a distributed vSwitch, which runs on multiple VM hosts, but acts like a single switch. To capture the traffic between instances of the vSwitch on different VM hosts, remember that the hosts are connected to each other via physical switches, which support port mirroring. This design complication is therefore fairly simple to monitor.
There is one complication in virtual networks which does not exist in physical networks. Physical switches use custom hardware to forward traffic at line rate, but vSwitches are software in the VM host. Therefore, vSwitches increase the host’s IO load, which may cause complications if the hosted apps are primarily IO bound. With that restriction in mind, the question from a network monitoring perspective is whether to perform real-time analysis or post-capture analysis.
If you are looking to monitor in real-time, the main item to consider on your monitoring host is capture buffer size, because you are working in a fixed amount of resources on the host for many different guest applications to share. This creates contention among your resources, so it is very important to set a buffer limit for network monitoring that is enough to get the job done without compromising the execution of other applications on the virtual machine.
It is best to use limited real-time analysis: either monitor with lightweight tools, or use packet capture only when you are facing immediate issues. For ongoing analysis, flow-based solutions are the best choice, as they typically don’t increase the IO requirements. Most of these will rely on the vSwitch to provide a statistical overview of the traffic it’s forwarding. Another alternative is to use a packet capture agent on critical VM guests, especially if that agent doesn’t continually forward the captured packets across the network. When a deeper dive is needed for troubleshooting, either connect to the packet capture agent, or use a packet capture VM guest on the host with a vNIC connected to a mirrored port on the vSwitch. Again, be aware that if the packet capture VM guest forwards the packets off-box, it will increase the IO load on the physical NIC on the host.
If you are performing post-capture analysis, it is essential that you collect all the packets at the time of the incident that are traversing the network for further analysis. This significantly reduces the need for buffering data in RAM for immediate analysis, but increases the need for disk space based on the overall throughput of the virtual network and the amount of time post-capture analysis that is required.
Post capture forensics searches are CPU and RAM intensive in a virtual environment, so it is best to perform this type of analysis when virtual machines are at low capacity, or perform the analysis outside the VM realm on a regular PC. Be judicious with the type of searches you perform, as this will save you time and resources.
Post capture analysis is best for long-term monitoring of vNetworks, just as it is with the physical environment. You never know when a virtual environment might go awry, so continuous capturing is necessary to ensure that you find the right packets. Having the data from the moment of failure will let you diagnose the issue, and not have to tediously reproduce the circumstances or nervously wait for it to happen again.
Although the virtual environment may be shaking up the rest of the IT world, network administrators have come up with tried-and-true methods for analyzing in the virtual world. Although certain network devices may have switched names, network administrators just have minor adaptation to ensure that they can keep their network up.