Best Practices for Managing Colossal Networks

40G is more than just a bigger pipe; it introduces significant new challenges in monitoring and analyzing data traversing the network. You can no longer employ the “break/fix” or “point and shoot” troubleshooting techniques used in the past after problems have already been reported. These high-speed networks require proactive, ongoing network monitoring and analysis to keep them performing as designed. And of course your tools must evolve just as rapidly as your network, which is certainly not always the case.

Monitoring and analysis of 40G networks requires updated tools as well as new strategies and approaches. Let’s take a look at some of the key, though perhaps not so new, strategies that must be employed when considering how to monitor, analyze, and troubleshoot a 40G network.

Capturing All of the Data – All of the Time
Performing on-the-fly analysis or trying to recreate problems that you missed the first time around is no longer feasible on high-speed networks. It is essential to capture network data 24×7, and store as much detailed data, down to the packet level, that you can. By doing so, you have a recording of everything that happened on the network, and you can rewind the data at any time to analyze a specific period of time, usage of a specific application, activity on a particular subnet, or even the details of a specific network flow. To do this effectively, we suggest purchasing a purpose-built network forensics solution, one that is specifically designed for high-speed networks, and that also includes a rich set of real-time statistics. This will help keep all of your data into a single repository for easy post-capture analysis.

Your network forensics solution may not be the only appliance that needs access to the 40G network stream. One way to simplify the collection of 40G network data for detailed analysis is by using an aggregation tap instead of connecting an appliance directly to the 40G network via a dedicated tap. This will provide significant flexibility when dealing with the 40G stream. You can just replicate the 40G feed to multiple network tools, or you can use the built-in filtering to send subsets of the traffic to different network tools, depending on your data analysis needs.

Storage capacity is a primary concern when performing network recording. Let’s say your average usage on your 40G link is 25%, or 10Gbps. At this data rate, assuming a network recording appliance with 32TB of storage, you can record 7 hours of network data. An aggregation tap can also help here, allowing you to split the data stream among multiple network recorders to achieve higher overall storage rates. Another option is to connect your network recorder to a SAN for additional data storage.

Understanding What is Normal
Knowing how you expect your network to be perform is all the more critical when trying to analyze colossal networks. In advance of an investigation, you’ll want to establish clear base lines of your network. If you’re already embroiled in a complex network analysis firefight it is too late to realize that your ability to assess “normal” conditions on the network may be lacking.

Analyzing the Essentials
When faced with an issue on your network, you’ll want to first analyze the essentials. The temptation is to try to capture and analyze everything, especially when the source of the problem is not immediately known. You do, however, know certain things about your network, which allows you to be selective in the analysis options you choose. Often a variety of conditions can be immediately ruled out, and using these clues to limit the collection and analysis to only what is necessary dramatically improves network analysis performance. For example, if you’re looking at a 40G network link, you’re probably not capturing wireless traffic, so you can turn off the wireless analysis. Turning off analyses that aren’t relevant to your investigation refines your search, making it more specific, and increases the processing power and throughput of the appliance you’re using.

Knowing the Limits
Even after analysis has been streamlined to only essential areas of the network, data capture for network analysis on 40G networks generates a great deal of data quickly, and managing the data becomes a significant challenge. Effective analysis requires that you know the limits of your tools, not just the available space for storage, but the processing limits of your appliance as well as how many users can access the appliance concurrently and perform analysis.

Moving from 1 to 10 to 40G introduces new challenges that are still being worked out in the industry, especially when it comes to support for network monitoring, analysis, troubleshooting, and security tools.

If you are in the midst of an upgrade or are thinking about upgrading to 40G, be sure to include the correct tools in the upgrade plan and budget, including solutions for establishing network baselines, capturing and storing the data 24×7, and performing network forensics as needed. It’s easy to continue to treat these networks like 1G, but they’re vastly different and require a new strategies for analysis.

One thought on “Best Practices for Managing Colossal Networks

Leave a Reply