There is nothing more frustrating than a slow network connection, as your users are very quick to point out. And their frustration can quickly become your frustration. Although not every report of a slow network connection is due to a bandwidth issue (or even a network issue, for that matter), just about every bandwidth issue results in slow network performance for at least some users. Furthermore, trying to determine what is happening to cause the reported problem can be daunting. There are many layers that need to be peeled back to determine if an issue is even bandwidth related, and if so, what the root cause is.
When bandwidth issues are suspected, especially if a problem has been ongoing, there’s a temptation to add bandwidth in the hope that this will solve the problem. But this approach can be a waste of time and money, since it is addressing the symptom rather than determining and addressing the root cause. Without a clear understanding of the real issue, the problem will still be there and will likely surface.
As with most complex issues, let’s break down the diagnosis of slow network performance due to a bandwidth problem into three more manageable steps.
1. Confirm That It’s a Bandwidth Problem
As was stated above, not every report of a slow network connection is due to a bandwidth issue, but just about every bandwidth issue results in slow network performance for at least some users. So, if the initial report is slow network performance, we must first determine if it’s related to bandwidth oversubscription or not. Fortunately, with the wide range of choices out there for network monitoring solutions, some based on flow-based technologies (NetFlow, sFlow, etc.) and some that are packet-based, correlating a problem report with overall network activity has become fairly straightforward. One simply needs to “rewind” the network activity to the period of time where slow network performance was reported, isolate the flow where the event was experienced, and then expand the view to include data collected at the router or switch level for the flow to see if the network link was oversubscribed. If so, a bandwidth issue of some sort is probably the cause, and a more detailed investigation is in order. Alternatively, most network monitoring solutions provide flexible alarms and notifications based on measurements like percentage of bandwidth utilization on a link. If a triggered alarm corresponds to the time of the report of poor performance, you’ve already answered the first question.
2. Determine If It’s a One-Time Event
If your network monitoring solution indicates that a bandwidth threshold has been exceeded, the next thing to determine is the frequency of such an event. Is this the first time, or have similar incidents occurred in the recent past? If it is the first occurrence, and especially if it is still ongoing, you should dig in quickly to see who the top talker is. Though the event could be benign, like too much video being downloaded by too many users on the day of a popular event, it could also be an indication of something serious, like a distributed denial of service attack or a worm infection. Regardless of the cause, a spike in bandwidth that is unpredicted and outside of the norms for your network should be taken seriously and addressed immediately.
If similar bandwidth spikes have occurred before, or if you’ve already determined that certain spikes are part of the norm (eg. order processing spikes first thing in the morning when the operations department starts their day), make sure that the current spike conforms to one of these norms. The best way to determine this is to have current baseline measurements of typical network usage, since it is highly unlikely that network usage is constant throughout the day. Understanding what the load is at different times of the day, and what the peak historical usage has ever been, will help you tremendously in determining the severity of the next spike in bandwidth usage.
3. Remove Unnecessary Traffic
More often than not, bandwidth problems come from unnecessary traffic on the network, like real-time video streaming during the 7th game of the World Series. By pruning your WLAN traffic and removing or reducing superfluous demands through the use of bandwidth shaping technologies you can better control overall bandwidth usage and ensure that adequate bandwidth is always available for your mission critical applications.
These three steps should help you determine when and where bandwidth issues occur and how you can better manage your network to stop slow network performance caused by network bandwidth spikes.