Amazon’s Web Services Goes Down – What You Can Do To Prevent Lengthy Outages

ComputerWorld’s JP Raphael recently wrote an article chronicling the 10 worst cloud outages and their effects on customers. The Amazon Web Services fiasco is a good one to note. According to the article, “the error started during a network upgrade, when a misrouted traffic shift sent a cluster of Amazon EBS (Elastic Block Store) volumes into a re-mirroring storm, as they sought out available boxes into which they could insert backups of themselves.” Engineers reached this conclusion after 4 days, a delay unacceptable to most irked businesses that had trusted their time sensitive data to Amazon. Having a root-cause analysis solution is key in these instances.

The business benefit of root-cause analysis is clear. Without it you cannot fully identify and address the conditions causing a network problem, whether performance, security, compliance, or process related. Glitches occur, failures will happen. The network is the lifeline for your business, so network problems are business problems, and every minute that the network isn’t operating as designed generates increased costs, loss of revenue and angry customers. Just ask Amazon.

Today, Flow-based technologies have quickly replaced SNMP as the core technology used for network monitoring and reporting as flow records contain more detailed information than SNMP and data collection is far more efficient.

However, as with SNMP, flow-based network monitoring does not do a complete job. Real-time, flow-based statistics provide visibility into how the network is operating, and can even allow some extrapolation as to how the network will continue to perform, but when conditions begin to degrade the best they can do is to raise the red flag. It provides very little information to determine the root cause of the issue, making it extremely difficult to isolate the problem and implement a permanent fix. This is the key difference between a Network Monitoring and a Network Monitoring for Analysis solution. In Network Monitoring for Analysis the same flow-based statistics are available for real-time network performance monitoring as in flow-based solutions, but detailed network data are also archived for forensic, or post- incident, analysis, eliminating the need to recreate the often ephemeral, anomalous condition that created the bottleneck and providing all of the detail needed to perform root-cause analysis to address the issue once and for all.

With Network Monitoring, you know there’s a problem. With Network Monitoring for Analysis, you know there’s a problem and why there’s a problem so that in the future you can avoid it all together.

Here are 3 approaches when monitoring your network. With packet-based monitoring, you can monitor for analysis.

• Simple Network Management Protocol (SNMP) is useful for identifying and describing system configurations. It monitors network-attached devices for basic high-level conditions such as up/down, total traffic (bytes, packets), and number of users. Unfortunately it uses polling which has a heavy bandwidth impact as lots of polled information traverses the very network you’re attempting to monitor.

• Flow Records are the default elements used in centralized, flow-based network monitoring. A “flow” is a sequence of packets that has 7 identical characteristics – source IP address, destination IP address, source port, destination port, layer 3 protocol type, type of service (TOS) byte, and input logical interface. Flow records vary by overall standard, vendor, and configuration. The most common are NetFlow, IPFIX, sFlow, and JFlow. Unlike SNMP, flow-based data yield more detailed statistics and provide good information about the overall health of the network. Flow-based analysis can impact network performance as it relies on the same equipment used to control network traffic – the routers and switches themselves – and can cause conflicts for processing power and memory. Typically, when network conditions begin to tax a switch or router it will revert to its “prime directive” – routing packets – reducing the reliability of the network monitoring data.

• Packet-Based captures each packet, using software and/or computer hardware, as traffic passes over a digital network or part of a network. Captured packets are then decoded and analyzed according to the appropriate Internet Engineering Task Force (IETF) RFC (Request for Comments) or other specifications. Unlike flow records which rely on statistical sampling, packet-based approaches generate 100% accurate information for each flow. Also, unlike both SNMP and flow records, there’s minimal network impact as all analysis is done locally at the point of capture, and on hardware that is not part of the network routing infrastructure.

Leave a Reply