Obviously, there are a number of considerations and best practices for troubleshooting a flakey network connection. That being said, here are three considerations that, in most cases, will expedite the process of identifying and pinpointing the problem and shorten the time to getting the network humming once again.
Consideration #1:
Can you record your network traffic and search though the data at the time the
issue occurs?
This is also known as network
forensics. Network forensics
refers to the capture, storage and analysis of digital evidence that flows
through your enterprise network. The most complete solutions record every
single packet that is transmitted over your corporate networks. So, any emails,
instant messages, FTP traffic or any other form of communication that takes
place on the network can be reconstructed from the original transmissions.
Forensics essentially allows you to reconstruct the history of your entire
network.
With more businesses relying on the cloud for
their IT infrastructure or to deliver their service/products to customers, it's
crucial to be monitoring both operations and the infrastructure. While the
network has become more reliable, reliance on web-based and cloud-served
applications or storage has lead to more frequent outages of that
infrastructure. By collecting digital evidence via a network recorder, the once
laborious, time-consuming searches (including top talkers, most delays,
application type, etc.) involving multiple tools and large transfers of data
can be reduced to a quick, convenient search.
Consideration #2: Is the problem stemming from one user or many
users on the same switch or segment?
Determining the scope the problem can point
the administrator in the right direction of where to start the network analysis
providing and what information is most useful determining and correcting the
issue. There are several ways of determining whether or not a problem is
stemming from one user or many users on the same switch. One of the more
common, but least desirable ways is by monitoring the number of trouble
tickets. Calls spike - most users are on the same subnet - this is a telltale
sign of a possible hardware problem. A far more proactive approach is to use
background analysis and monitor for conditions like non-responsive client or
server, or low client-server or server to client throughput. You will
quickly see if these issues are being reported for a single client, or across
many clients. If for a single client, isolate this client for analysis.
Determine what other network activities this client is engaged in, and examine
these network flows. This will quickly shed light on the issue. If the problem
is stemming from many users, is the problem isolated to a single application,
or is the issue broadly affecting overall connectivity? If confined to a single
application, that's the place to dig. If the issue is overall connectivity for
many users, find the connectivity point common to these users and see check for
hardware issues.
Consideration #3: Is the problem connectivity or utilization related?
Is the network traffic getting to the
specified destination? Is a specific machine over-consuming its allocation
of bandwidth and crippling other users connectivity while doing some action? On
the utilization front, non-work related, "bandwidth sucking" download
activities (music, videos, games, etc) are a common culprit. Utilization-related issues are typically intermittent
in nature. One, or perhaps several, clients are over-utilizing a network
segment, but that comes and goes. Even if the oversubscribing event is long in
nature (like streaming video) the remaining utilization still goes up and down
with normal network usage, creating intermittent periods of over-utilization.
This can easily be monitored by graphing the network utilization in real-time.
Connectivity-related issues are typically more binary - users either can or
cannot connect to a particular network segment or a particular application. If
the issue is utilization related, the next step is to determine if it is client
or application driven. This is fairly easy to determine by looking at the top
talkers on the network. If the top talkers are clients, drill down and see what
protocols the client is using. This typically reveals the source of the problem
quite readily. If the issue is connectivity related, the next step is to
determine if connectivity is being affected by network congestion, or hardware
problems. Network congestion is again easily seen by monitoring network
utilization is real time. If not congestion, then the issue is likely to be
with hardware within the user(s) connectivity path.

Leave a comment