pointer

Tag Archives: Compass Dashboard

Q&A: How Can I Ensure the Best Application Performance?

The network is often blamed for poor application performance, even if the network is not the culprit. Network engineers therefore need to know how to determine the cause of the problem, even if it’s in the application itself. Below, you’ll find the most common questions we get from our clients on this topic, and how you should address them – whether you’re working with wireless or in a cloud environment.

Q: How do you know if it’s your network or your application?
A: This is the first question you should figure out when users are complaining about a poor application response time. We’ve gone into great detail on this subject, but here are some initial indicators to help you prove that the network is not the culprit:

Packet-level monitoring will show you the conversation between a client and a poorly performing application. If a user request is followed by a quick network acknowledgement (ACK) but a delayed data response, it means it is an application issue. If the ACK is delayed or missing, then the network is to blame. You can get this information clearly visualized in OmniPeek’s Compass dashboard, using the “2-Way Latency” setting, which displays the network and application latencies, with drill-down on a per-node or even per-flow basis.

If you’re streaming Expert events into a log management system, network issues are shown through slow acknowledgements, TCP slow segment recovery, slow and frequent re-transmission, and low throughput. Application problems manifest in slow application response times.

Q: Does application monitoring change in a virtual environment?
A: While network monitoring may be more difficult in a virtual environment due to the introduction of overlay networks and virtual switches which often aren’t controlled by the network team, the fundamental analysis techniques are still valid. A capture only has to be in the packet path between the client and the server in order to get diagnostic info and answer the basic question: is the problem in the network or in the application?

Q: How should I address application problems if they are housed in a cloud environment?
A: Cloud is generally hostile to packet capture, since there’s no network visibility if you don’t control the network. In this environment, we recommend focusing on the end user experience. If there are complaints or concerns about the application performance, capture on a client machine to see what the traffic pattern reveals. We’ve found that many of our customers appreciate using the OmniPeek Remote Assistant capture agent, as it’s a lightweight capture tool with a simple user interface to capture packets from a Windows client. The encrypted capture file can then be sent back to the network team for analysis.

Q: How should I handle issues with real-time applications like VoIP?
A: Sadly, with VoIP it often is a network problem. Latency is often the root cause. Sometimes it’s a transitory problem, like routing reconvergence after a link goes down (or up). Sometimes, the packets are simply routed through inappropriate equipment, like a proxy which doesn’t do any VoIP analysis, but which still adds latency.

There is a pair of tools we recommend for VoIP. First, use the built-in VoIP analysis in OmniPeek Enterprise to measure the MOS scores and determine how widespread the problem is. Second, use Multi-Segment Analysis (MSA) to capture at multiple points simultaneously in the packet path, to determine whether there are any significant sources of latency in the network, and where they are.

Q: How do you know if application performance is sufficient?
A: This is subjective to the end user. We usually suggest examining the application response time. This measures the time it takes an application to respond to a specific user request on a per-request or per-flow basis. The Expert dashboards in OmniPeek and OmniEngine Enterprise will give you these numbers very easily.

Our products also assign one of three basic levels of performance: satisfied, tolerating and frustrated. We dive into deeper detail about how you should measure and report this here.

Q: What if my application has a bug in it? How do I know and how should I solve the problem?
A: Once you’ve demonstrated that there is a problem in the application, the next steps may not be obvious to the sysadmins or developers. Most modern applications are highly modularized, split into multiple layers across many different servers, and if a back-end service is slow to respond, that delay will propagate all the way to the user. Packet capture can provide insight here as well: use the application performance analysis techniques on a capture taken on the front-end server to see whether there’s a dependency on other servers, and what the application response looks like from those remote systems. This works even if the connections are SSL or TLS encrypted, as it will be clear which packets are simple ACKs and which are application-layer responses. Repeat until you find which server in the distributed application is causing the major slowdown.

As always, please let us know if you have any additional questions!

Finding Bandwidth Hogs with the Compass Dashboard

Given the low cost and feature-rich networking equipment available today, it’s easy for bandwidth hogs to quietly operate in the background. Most corporate networks have plenty of bandwidth and lots of additional features, like traffic shaping, that significantly reduce the impact of bandwidth hogs. But they can certainly still be a problem, especially on slower, remote office networks, or wireless networks. And problems can be aggravated when “hogging events” occur, like video streaming of live, popular events, which bring even casual bandwidth hogs out of the closet.  With WildPackets Compass dashboard, you can easily navigate your way through the network to find exactly who the bandwidth hog(s) is and what he or she is doing to suck up so much bandwidth.

Identifying Spikes in Network Usage
The Compass dashboard in OmniPeek allows you to get an overall view of network utilization, whether by bits, bytes, or packets. It is an excellent starting point for identifying overall spikes in network usage, the first step in identifying the culprit behind the spike. As we can see in the following screen shot of the Compass dashboard, our overall network utilization on our wireless network has been erratic, with several spikes over the last hour or so. We can now use the interactive nature of the Compass dashboard to determine which user(s) are responsible for the various spikes in network activity.

Identifying Bandwidth Hogs
All that needs to be done is to isolate a spike, and the Compass dashboard will do the rest. Let’s choose the right-most spike in the above screen shot. Simply highlighting the area of the utilization graph directs the Compass dashboard to drill in on that area, focusing all of the Compass dashboard windows on only that period of time. This is illustrated in the following screen shot.

As you can see, not only have we focused on the utilization from just this time period, but the Top Protocols, Top Flows, and Top Nodes also reflect network utilization from just this time period. Looking first at Top Flows, we see that the conversation between 10.2.0.56 and 206.169.145.205 on port 80 is by far the largest flow, and we know that 10.2.0.56 is a user on our network. Both the listed port and the Top Protocols pie chart confirm for us that this is web traffic, and the Top Nodes histogram clearly shows that the web activity was YouTube traffic. So, a single step using the Compass dashboard provides us will all the data we need to know exactly who our bandwidth hog is, and what they’re doing on the network.

Further Analysis
Knowing that there was a spike in network traffic, and who caused it, is certainly valuable information. But spikes happen all the time. As network engineers, what we’re most interested in is whether or not this event created any adverse effects on our network, and one of the best metrics to determine this is network latency. The Compass dashboard continuously analyzes for network latency, and displays this information over time. Simply change the parameter in the graph from “Mbits” to “Worst 2-Way Latency,” and we can now see the latency for the period of time when the spike took place, as illustrated below.

As the graph shows, our worst 2-way latency continues to increase while the YouTube download is occurring, reaching a maximum value of almost 13 seconds. In our book this is certainly an adverse effect!

The Compass dashboard is a flexible, extremely versatile view into overall network activity. With its rich set of network metrics and the ability to instantly drill into specific time periods, it guides you to exactly where network problems are occurring, in this case, identifying a potential bandwidth hog, along with the negative impacts this activity is having on the network.