Category Archives: Cloud Network Analysis and Monitoring

RSA: The Rising Cyber Security Threats Attacking Your Network

The RSA Conference is one of the premier cyber security gatherings in the IT industry. Companies, analysts, and cyber security professionals flood to San Francisco every year to hear talks from the experts, see the latest products on the Expo floor, and socialize at a week of parties. The conference has grown over the years, just as emphasis has increased within IT on cyber security, as Jon Olstik of NetworkWorld points out:

RSA use to be an oasis from mainstream IT and a place to discuss DLP, web security and key management. It was an under-funded IT step child and the RSA Conference was still centered on bits and bytes. That was then, this is now and cyber security is everywhere!

But it makes sense. We live in a world where our bank accounts can be hacked by someone thousands of miles away, where companies have data about our personal lives that they can sell to advertisers, and where governments routinely perform cyber espionage. Security and privacy is no longer restricted to a smaller corner of the IT department: it affects everyone.

So, how can you as an IT or network admin help protect your network from being hacked? Here are a few ways to make sure that you are on top of your network security policy:

Passwords Attacks and Best Practices
There are two main ways that passwords are the cause of breaches. First is simply guessing, which is paradoxically becoming more sophisticated. Analysis of the numerous password breaches over the past year show that most people are using passwords which can be guessed easily, including Syrian President Assad’s use of “12345”. However, enforcing more complex passwords isn’t necessarily the answer, since it leads to the classic “sticky note under the keyboard,” or its equivalent in a mobile workforce.

Detecting password guessing is relatively straightforward: look for repeated login attempts, especially for login failures. While this is usually easiest via server logs, it can work on the wire too by looking for repeated access to the login URL for a web app.

However, an increasingly common cause of password-related breaching is so-called “spear phishing,” in which an attacker will send an email to a target pretending to be something innocuous, or even something official. A common technique among professional penetration testers is to send an email claiming to be from the company IT department, with a link to a site that requests the user’s username and password. Average success rates for this spear phishing technique are over 30%.

Stolen passwords can be difficult to detect, but Google recently shared one of their methods: look for logins that happen from different locations. It would be rather unusual for a GeoIP lookup of the login to come from two different continents within minutes of each other!

Best practice for good passwords is still 2-factor authentication with a hardware token. If it’s cost-effective for Blizzard to use with World of Warcraft, then it is be cost effective for your organization. There are even open source or dual license solutions available.

Monitoring IT That is in Public Clouds
The idea of sharing a public utility in general can be scary, especially when IT personnel do not have control over every aspect of the company’s infrastructure. Beyond this concern there are other tactical security concerns that need to be addressed prior to moving to a public cloud, as well as while you are monitoring your cloud service.

One of the emerging challenges is the push for Single Sign-On (SSO) in cloud-hosted applications. This is a complicated issue, and it’s easy to get lost in the discussion of “if Facebook can do it, why can’t we?” versus “Let’s use OAUTH like Twitter!”. Our recommendation is to start with knowing the scope of the problem, and an excellent resource is a recent series of articles on Securosis.

From a detection perspective, cloud security is about knowing where the dotted lines are that define what used to be your perimeter. Understand your traffic between your in-house services and your cloud instances, enforce them with firewalls if not VPNs, and audit them frequently.

Continued Network Monitoring and a Contingency Plan
Your best technique to combat evolving security threats is vigilance. That doesn’t mean sitting 24×7 watching the network. It means using the tools at your disposal to gain visibility. If you’re using a SIEM to correlate IDS and log data, configure your OmniEngine software probes to send the Expert event log to the SIEM as an additional data source. Not only will it give you an additional data collector (especially if you’re using custom filters), it will also tell you where in the capture to look when you do investigation of events.

This monitoring of your network 24/7 is a great tool for network forensics. Network forensics works as a contingency plan in case a security breach does occur. It can help you clean up your network to make sure that there are no lingering worms or other suspicious traffic, and it can also help to determine where the hacker breached your network so you can fix any security holes.

Keep in mind that you’re not just looking at the Top 10. If anything, you’re looking for a node in the long tail that’s relatively quiet, but which suddenly starts sending more traffic, or starts using different protocols than before. If you’ve got a desktop PC that suddenly starts sending probes to other parts of your network, that’s suspicious activity that you should investigate, and that you might not have noticed by relying purely on an IDS.

Cyber security threats are not going away and they will continue to become more sophisticated over time. It is important to be aware of trends affecting the security industry (both big and small), so you can be versed and prepared to protect your network against both nascent and lingering threats out there.

Q&A: How Can I Ensure the Best Application Performance?

The network is often blamed for poor application performance, even if the network is not the culprit. Network engineers therefore need to know how to determine the cause of the problem, even if it’s in the application itself. Below, you’ll find the most common questions we get from our clients on this topic, and how you should address them – whether you’re working with wireless or in a cloud environment.

Q: How do you know if it’s your network or your application?
A: This is the first question you should figure out when users are complaining about a poor application response time. We’ve gone into great detail on this subject, but here are some initial indicators to help you prove that the network is not the culprit:

Packet-level monitoring will show you the conversation between a client and a poorly performing application. If a user request is followed by a quick network acknowledgement (ACK) but a delayed data response, it means it is an application issue. If the ACK is delayed or missing, then the network is to blame. You can get this information clearly visualized in OmniPeek’s Compass dashboard, using the “2-Way Latency” setting, which displays the network and application latencies, with drill-down on a per-node or even per-flow basis.

If you’re streaming Expert events into a log management system, network issues are shown through slow acknowledgements, TCP slow segment recovery, slow and frequent re-transmission, and low throughput. Application problems manifest in slow application response times.

Q: Does application monitoring change in a virtual environment?
A: While network monitoring may be more difficult in a virtual environment due to the introduction of overlay networks and virtual switches which often aren’t controlled by the network team, the fundamental analysis techniques are still valid. A capture only has to be in the packet path between the client and the server in order to get diagnostic info and answer the basic question: is the problem in the network or in the application?

Q: How should I address application problems if they are housed in a cloud environment?
A: Cloud is generally hostile to packet capture, since there’s no network visibility if you don’t control the network. In this environment, we recommend focusing on the end user experience. If there are complaints or concerns about the application performance, capture on a client machine to see what the traffic pattern reveals. We’ve found that many of our customers appreciate using the OmniPeek Remote Assistant capture agent, as it’s a lightweight capture tool with a simple user interface to capture packets from a Windows client. The encrypted capture file can then be sent back to the network team for analysis.

Q: How should I handle issues with real-time applications like VoIP?
A: Sadly, with VoIP it often is a network problem. Latency is often the root cause. Sometimes it’s a transitory problem, like routing reconvergence after a link goes down (or up). Sometimes, the packets are simply routed through inappropriate equipment, like a proxy which doesn’t do any VoIP analysis, but which still adds latency.

There is a pair of tools we recommend for VoIP. First, use the built-in VoIP analysis in OmniPeek Enterprise to measure the MOS scores and determine how widespread the problem is. Second, use Multi-Segment Analysis (MSA) to capture at multiple points simultaneously in the packet path, to determine whether there are any significant sources of latency in the network, and where they are.

Q: How do you know if application performance is sufficient?
A: This is subjective to the end user. We usually suggest examining the application response time. This measures the time it takes an application to respond to a specific user request on a per-request or per-flow basis. The Expert dashboards in OmniPeek and OmniEngine Enterprise will give you these numbers very easily.

Our products also assign one of three basic levels of performance: satisfied, tolerating and frustrated. We dive into deeper detail about how you should measure and report this here.

Q: What if my application has a bug in it? How do I know and how should I solve the problem?
A: Once you’ve demonstrated that there is a problem in the application, the next steps may not be obvious to the sysadmins or developers. Most modern applications are highly modularized, split into multiple layers across many different servers, and if a back-end service is slow to respond, that delay will propagate all the way to the user. Packet capture can provide insight here as well: use the application performance analysis techniques on a capture taken on the front-end server to see whether there’s a dependency on other servers, and what the application response looks like from those remote systems. This works even if the connections are SSL or TLS encrypted, as it will be clear which packets are simple ACKs and which are application-layer responses. Repeat until you find which server in the distributed application is causing the major slowdown.

As always, please let us know if you have any additional questions!

Q&A: How IT Trends Are Affecting the Network

We get a lot of questions from our customers about how they should prepare themselves for different technologies (cloud, 10G, etc.). For this blog we wanted to answer some of the common questions that we receive from our customers – mainly network engineers/administrators – regarding both specific networking trends as well as larger technology trends.

We have Jay Botelho, Director of Product Marketing at WildPackets, answering these questions. He’s been in the networking business for over 25 years. If you have any additional questions for our networking guru please be sure to let us know and we’ll address them in our next Q&A blog.

Q: What are some best practices for troubleshooting in a virtual environment?
A: Virtualization creates “blind spots” in your network, making it difficult to monitor with traditional techniques, i.e. spanning a switch port from a physical Ethernet switch to collect packet-based data for complete root-cause analysis.

Here’s a common scenario. A user is experiencing abnormally long delays while working with a specific application. You know from your network architecture that the app is running on a VM, but you’re not the application engineer so you’re not familiar with all of the nuances of the application’s operations (what data sources it accesses, under what conditions, etc.). You’re able to start a packet capture session on a switch just upstream from the virtual server running the app, and after filtering and watching the user connection to the app you can confirm long delays for some operations, but it is clear the delay is NOT between the user and your capture point. The delay is within the VM, in your blind spot, where communication between the application and the database are virtualized on the same VM.

To address this issue, you must collect data from  the virtual switch(es) within the VM in order to get visibility between the application and the database. There are several techniques to achieve this. First, you can use OmniVirtual, a packet-capture software probe specifically designed to run on VMs. You will need to allocate space on the VM to run OmniVitual, just as you would any other application. Once running, OmniVirtual will have access to all data crossing any of the virtual switches on the VM.

A second alternative is to use a virtual tap, available from several tap vendors. These virtual taps install at the VM layer, acting like a traditional tap, providing access to data crossing virtual switches on the VM to a host of solutions, including network analysis and troubleshooting solutions like the Omni Distributed Analysis Platform.

Q: How will migrating to a public Cloud affect my job?
A: There’s some very good data available from industry analysts that predict you will be busier as your company migrates to the Cloud, so don’t worry about your job! But your role will change from managing not only your own infrastructure, but overall service availability and performance in the Cloud as well. Cloud computing merely shifts your application servers from your facility to a third party. Issues like bottlenecks, bandwidth hogs, and unauthorized protocol usage will still adversely affect application traffic. Thus, more diligence must be applied in monitoring application performance and making sure that your service provider is living up to its promises.

Q: Why do issues with VoIP continue to persist?
A: This is a question that we consistently get from both potential customers and customers looking to deploy VoIP. The core issue is that networks are really not that friendly to real-time data (RTP, or real-time protocol, which is used by VoIP and video). Most networks are optimized to carry TCP/IP data traffic, which is much more tolerant of latency, packet loss, and jitter than is RTP. To compensate for this, networks require additional configuration for VoIP, including the use of Quality of Service tagging – QoS (at a minimum) – or even dedicated VLAN or MPLS segments to segregate and give priority to RTP traffic. If you either have or are planning a transition to VoIP, be sure you are using a network analysis solution that treats VoIP like any other data type on the network, since that’s exactly what it is. Often times VoIP problems spike during times of heavy network usage, so you need a solution that can see everything at once and allow you to correlate the activity of all traffic on your network simultaneously.

Q: How will my network analysis needs change as we roll out 10G?
A: 10G is a game-changer for network analysis and troubleshooting. The still oft-used “break/fix” method of network troubleshooting – where packet-based network analysis is only performed after a problem is reported – is no longer effective at 10G. With more data consolidated through fewer resources, the number of problems per segment increases, and the increased network speeds make it far more challenging to try to reproduce problems, or wait for them to happen again. At 10G you need to monitor and record packet-level data on an ongoing basis, arming yourself with a recording of all activity on these highly-utilized segments. If real-time monitoring indicates a negative trend, or if problem reports are rolling in, you can simply “rewind” the network to the troublesome period of time and analyze exactly what was going on. No waiting for it to happen again, and no need for Herculean efforts to reproduce the problem. You have all the data you need to solve the problem – immediately.

This wraps up our first Q&A session. Please keep your questions coming. We’re always up for a challenge, and let’s face it, we picked the softballs this time around…