Tag Archives: NetFlow Collector

The Basics of Flow Analysis

When it comes to enterprise network monitoring, flow-based solutions are by far the most popular, with 30-40 major flow-based network monitoring solutions on the market today. With that many solutions, how do they differentiate from one another, and which one will be best for your network? To determine this, let’s start at the beginning, with the basics. How does a flow-based solution work?

The Data Source

Switches and routers are the primary sources of flow data. Since every packet is traversing the device, it is relatively easy for the device to extract key data from the packets, of course requiring extra processing. Depending on the protocol being used to analyze the packets, and the current load on the router or switch, sampling may be employed, and this could lower the accuracy of the data being reported. All flow-based reporting protocols categorize packets into a flow based on the following seven characteristics: source IP address, destination IP address, source port, destination port, layer 3 protocol type, TOS byte, and input logical interface. The device keeps track of all the flows, storing the information in available RAM, and once every configured interval packages up the data into a stream of UDP packets following a predefined format (like NetFlow or sFlow) and transmits these packets to a user-configured IP address, known as the Collector.

The Data Collector

Once the UDP data stream with the flow-based information leaves the switch or router it is purged and forgotten. It is now the responsibility of the Collector to receive, process, and store the flow-based information. Keep in mind that the original delivery to the Collector is over UDP, which is not a reliable transport, so dropped packets from the switch to the Collector can be a problem (a protocol analyzer like the OmniPeek Network Analyzer can help to identify if this is an issue on your network). Also, the packet stream from the switch is adding to the traffic load on your network, so this should be taken into consideration. Each packet typically contains information on five to ten flows, so a busy network segment can generate a significant number of packets. The frequency of data pushed from the switch to the Collector is something that is configured on the switch, and is typically set to one minute, though you may find a different interval works best in your specific environment.

The Collector becomes the central repository for all data from that switch or router, and from many others, because a single Collector is designed to support multiple data sources. A Collector employs either a proprietary data structure or database to store the large volume of data that accumulates from the flow-based sources, and retains the data for long periods of times (months, at least) for reporting. A flow-based monitoring solution is a combination of a Collector, or set of Collectors, and a central server which processes user requests, communicates with Collectors, and returns the desired results to the user.

What is the difference between Flow-Based Solutions?

Differences between network monitoring solutions based on flow data come in two forms. The first is the type of flow data. Different network device vendors support different flow-based protocols. The most common protocols are NetFlow (Cisco), sFlow (Foundry), JFlow (Juniper), and IPFIX – a proposed industry standard. Each protocol deals with the generation of flow records just a bit differently, with the major difference centered on whether or not sampling is used and how aggressively it is used. The other difference in flow-based network monitoring solutions is in how the vendor presents (displays) the data, and any unique ways each vendor finds to process the data to provide unique results. Unique data processing and presentation is really the only way for vendors to differentiate themselves since the source and format of the data is essentially the same regardless of the underlying flow-based protocol.

What solution would you find most helpful for your company and why? We always suggest that enterprises have something greater than just a flow-based solution, as flow-based solutions tend to lack all the details required for root-cause analysis on your network. If you are interested in learning more about these issues, check out our blog post, “Is A Flow-Based Solution, A Whole-Based Solution?”.

What is a NetFlow Analyzer?

Before we address this question, we must address an even more basic question “What is NetFlow”? NetFlow, and other flow-based technologies like sFlow, JFlow, and IPFIX, are simply specifications for collecting certain types of network data for monitoring and reporting. The data sources are network devices themselves, like switches and routers, the idea being to leverage existing resources in the network to provide data that is for the most part already being processed by these devices. To that end, flow-based systems provide an economical source for network monitoring data.

All flow-based systems start with flows as their basic element. A flow is a sequence of packets that has the following seven identical characteristics: source IP address, destination IP address, source port, destination port, layer 3 protocol type, TOS byte, and input logical interface. By definition, a flow is unidirectional. Flows are processed and stored by supported network devices as flow records, and it is these flow records that vary from specification to specification - i.e. a NetFlow flow record does not take quite the same form as an sFlow flow record. This requires different parsing and processing techniques for each flow-based specification. It is at this step where flow records are consumed and the term NetFlow Analyzer is introduced.

Basic flow analysis is a multistep process, requiring several different elements to be present. Packets enter a switch or router, just as they would as part of normal network operation. If the network device is flow-enabled and the feature is active, additional processing will take place to identify individual flows in the packet stream per the seven characteristics mentioned above. Depending on the configuration of the network device and how busy the network is at any given time, this processing may be done on every packet, or just a sampling of the packets. As flows are identified, flow records are created per the specification supported by the network device, for our purposes NetFlow, and the records are stored locally in the network device. As flows are completed, the records associated with those flows are exported to an external NetFlow Collector, where they are archived for further analysis and reporting. Once the flow record leaves the network device it is deleted from memory to make room for other flow records. Though efficient since the packets already must be processed by the network device, NetFlow does put an additional strain on the network device since it requires additional processing beyond that required for only switching or routing, and it requires additional storage on the switch for the flow records being processed and exported.


A NetFlow Analyzer includes the NetFlow Collector, which accepts and stores the completed flow records; a storage system to allow for long-term storage of large volumes of flow-based data; and analysis software to mine, aggregate, and report on the collected data per user requests through a customized UI, often web-based but sometimes client-server. The NetFlow Analyzer can be software-only or appliance-based, but most systems are appliance-based, and the system often includes multiple appliances.

So what are the advantages? NetFlow data comes “for free” from NetFlow-enabled network devices, eliminating the need for additional network probes to collect the flow-based data. But remember, it’s not entirely free since it requires processing and storage resources on the network device thereby competing with the prime directive of the device – route packets. Given the 7 characteristics of a flow, NetFlow Analyzers can provide a relatively detailed set of network performance data, and given enough storage this data can be archived for quite a long time providing a long-term record of network behavior.

But there’s no such thing as a free lunch. NetFlow Analyzers may not always be 100% accurate since the source of the flow data can be from sampling and not an analysis of each and every packet. NetFlow Analyzers also create additional network traffic moving flow records from the network device to the NetFlow Collector, possibly impacting performance on an already busy network. And NetFlow Analyzers can report on nothing more than the information they can interpolate from the 7 flow characteristics, making them excellent network monitors but poor network analysis solutions because they often lack the data to perform root-cause analysis once a network anomaly is detected. Network analysis systems that derive data from independent

interrogation of each an every packet, like the OmniPeek Distributed Analysis Solution, provide all the data necessary not only for detailed network reporting, but for advanced, root-cause analysis as well. No sampling, no need to move data across the network for storage and analysis. All analysis is done at the source, by tapping into a network device and processing all the data locally.

Each system has its place, but when the time comes for root-cause analysis, and it always does, a packet-based analysis solution like the OmniPeek Distributed Analysis Solution is what you need.