Obviously, there are a number of considerations and best practices for troubleshooting a flakey network connection. That being said, here are three considerations that, in most cases, will expedite the process of identifying and pinpointing the problem and shorten the time to getting the network humming once again.
Consideration #1: Can you record your network traffic and search though the data at the time the issue occurs?
This is also known as network forensics. Network forensics refers to the capture, storage and analysis of digital evidence that flows through your enterprise network. The most complete solutions record every single packet that is transmitted over your corporate networks. So, any emails, instant messages, FTP traffic or any other form of communication that takes place on the network can be reconstructed from the original transmissions. Forensics essentially allows you to reconstruct the history of your entire network.
With more businesses relying on the cloud for their IT infrastructure or to deliver their service/products to customers, it’s crucial to be monitoring both operations and the infrastructure. While the network has become more reliable, reliance on web-based and cloud-served applications or storage has lead to more frequent outages of that infrastructure. By collecting digital evidence via a network recorder, the once laborious, time-consuming searches (including top talkers, most delays, application type, etc.) involving multiple tools and large transfers of data can be reduced to a quick, convenient search.
Consideration #2: Is the problem stemming from one user or many users on the same switch or segment?
Determining the scope the problem can point the administrator in the right direction of where to start the network analysis providing and what information is most useful determining and correcting the issue. There are several ways of determining whether or not a problem is stemming from one user or many users on the same switch. One of the more common, but least desirable ways is by monitoring the number of trouble tickets. Calls spike – most users are on the same subnet – this is a telltale sign of a possible hardware problem. A far more proactive approach is to use background analysis and monitor for conditions like non-responsive client or server, or low client-server or server to client throughput. You will quickly see if these issues are being reported for a single client, or across many clients. If for a single client, isolate this client for analysis. Determine what other network activities this client is engaged in, and examine these network flows. This will quickly shed light on the issue. If the problem is stemming from many users, is the problem isolated to a single application, or is the issue broadly affecting overall connectivity? If confined to a single application, that’s the place to dig. If the issue is overall connectivity for many users, find the connectivity point common to these users and see check for hardware issues.
Consideration #3: Is the problem connectivity or utilization related?
Is the network traffic getting to the specified destination? Is a specific machine over-consuming its allocation of bandwidth and crippling other users connectivity while doing some action? On the utilization front, non-work related, “bandwidth sucking” download activities (music, videos, games, etc) are a common culprit. Utilization-related issues are typically intermittent in nature. One, or perhaps several, clients are over-utilizing a network segment, but that comes and goes. Even if the oversubscribing event is long in nature (like streaming video) the remaining utilization still goes up and down with normal network usage, creating intermittent periods of over-utilization. This can easily be monitored by graphing the network utilization in real-time. Connectivity-related issues are typically more binary – users either can or cannot connect to a particular network segment or a particular application. If the issue is utilization related, the next step is to determine if it is client or application driven. This is fairly easy to determine by looking at the top talkers on the network. If the top talkers are clients, drill down and see what protocols the client is using. This typically reveals the source of the problem quite readily. If the issue is connectivity related, the next step is to determine if connectivity is being affected by network congestion, or hardware problems. Network congestion is again easily seen by monitoring network utilization is real time. If not congestion, then the issue is likely to be with hardware within the user(s) connectivity path.