Best Practices for Application Performance Monitoring and Troubleshooting

Network professionals do not have the luxury of simply responding to network failures. When problems arise, regardless of their nature, the network gets blamed. And as every network engineer knows it’s typically not the network at fault; it’s usually the application.

As the expression goes, the best defense is a strong offense. That’s why it’s essential to continuously monitor both network and application performance so the root cause of “network” issues can be easily identified.

For this blog, we’ll discuss best practices for ensuring application performance, whether your applications are virtual or hosted in a third-party data center like software-as-a-service (SaaS) or cloud computing.

Staging Application Monitoring
Most applications today are no longer centralized in a single data center. Instead, they are widely distributed, whether it’s due to a distributed data architecture within an enterprise, or usage of SaaS or cloud-based computing.

As you did with centralized applications, you still need to monitor key metrics like latency (both network and transaction), number of turns, overall network bandwidth, and payload sizes, which are particularly handy for application-level troubleshooting. However, with a distributed architecture, this information is no longer available from a single monitoring point. Data for analysis must be collected from multiple points along the data path in the network to provide the best possible data for analysis. In order to get the full picture of your network and applications, you need to monitor all key network links and hops.

Data collection from multiple points, though essential, does make analysis more complicated since you need to know which capture points are the key points for the problem you are analyzing. Multi-segment analysis alleviates this complication. Multi-segment analysis is a post-capture method that automates the process of analyzing network data from multiple network segments and/or multi-tier applications. It compiles and correlates just the data you need in a single view so you can easily pinpoint where anomalies are occurring along the data path, from the client to the server and back.

The Virtual and Cloud Factor
Virtualization introduces new challenges both in monitoring the physical network, and application performance. Even with this complexity, the fundamental analysis techniques are still valid: a capture only has to be in the packet path between the client and the server to diagnose the problems, even if this path is virtual.

We just covered this topic in detail in our last blog post. Click here to read.

If you are working with fully hosted cloud-based applications, your flexibility for monitoring data on the Cloud side of the application is very limited, and most likely non-existent. The key here in terms of application performance is focusing on end user experience. If there are complaints about application performance, capture on a client machine and at the WAN ingress/egress point to see round-trip application performance as well as the performance of the specific client.

Reactive Analysis
If you are continually monitoring and assessing your network, you can quickly and easily spot issues before anyone complains. That said, most folks do not proactively monitor and instead wait for the complaint to happen – reactive analysis.

If this sounds like you, then the first step in discovering if it is a network or application problem is to turn to your favorite packet-based network analysis solution (we hope it is OmniPeek).

The next step is finding the best place to monitor the offending application. Remember, multiple analysis points along the path will make troubleshooting much easier with multi-segment analysis, but this may be impractical when in a reactive mode. It’s important to keep in mind where users are located, and whether it is a single user that is having problems or a broad range of users. If it is a single user, try isolating the network traffic for just that user to reduce the amount of data to be analyzed. If it is broad range, monitor closer to the application server and isolate your analysis to just the users of that application.

When the monitoring points are established, you can start collecting network data (packets). If you are sure that this is an isolated application issue, then filter as described above. But if you’re still not sure, widen up the data that you collect to make sure you’re not missing critical data.

If your network analysis system employs expert analysis, this will be an excellent guide for your problem search. Look at the specific types of expert events being logged and in what layers they are being reported. Problems arising in the application or server layer typically mean that the application is at fault. If the transport layer is implicated, then it may be your network.

Expert analysis may not always reveal the issue. In that case you need to go deeper and look into the overall packet bounce diagram for the network conversation in question. If the diagram indicates that user requests are followed by quick network acknowledgements (ACKs), but a delayed data response, then the problem is most probably with the application. Conversely, if the ACK is delayed or missing, then the network may be to blame.

Want to learn more about application troubleshooting? Let us know what you think we missed in this blog and what you would like us to cover next time for application performance monitoring. Also, if you already use our products, we have tons of videos on how to troubleshoot applications. Check out all our video tutorials on our YouTube channel.

Leave a Reply