Agents in The Cloud

Simple Network Management Protocol (SNMP) has been the standard technology behind IT operations management for more than two decades. Originally designed for network routers and switches, RFC 2790 paved the way for open source agents, such as UCD-SNMP and later Net-SNMP, to provide monitoring of operating system performance metrics and configuration data.

Management software programs such as HP Openview NNM, CA Spectrum, and ScienceLogic EM7, actively poll SNMP data from each device over a network connection. SNMP was initially designed back in the day when most companies managed their own data centers and controlled their own networks. Management systems were frequently deployed in near proximity to the devices and servers they monitored. As a result, SNMP was designed to be a light-weight and fast protocol, thereby leveraging the UDP (User Datagram Protocol) protocol. The reason UDP is faster than TCP is because there is no form of flow control or error correction. Data sent over the public Internet that is affected by collisions or errors will present packet errors. The good news is that Net-SNMP is one of the few agents that support both TCP and UDP configurations. The bad news is that many management systems don’t support SNMP over TCP. Check to make sure your management system supports TCP SNMP requests.

SNMP security is another area of concern for managing resources in the Cloud. SNMP security has long been concerning to systems administrators, although most of the security concerns are due to limited understanding of the agent configuration. When using SNMP over a public network, administrators should only use SNMP v3 because it supports both AES (Advanced Encryption Standard) and 3-DES (Data Encryption Standard) encryption methods, which ensures that your data in not being snooped by undesirables. For Windows administrators, it’s worth noting that the SNMP agents embedded with Microsoft Windows Servers (all versions) do not support SNMP v3; however, Net-SNMP for Windows does support SNMP v3.

Polling Cloud-based SNMP device from either your enterprise or a hosting provider essentially means that you are relying on the public internet to maintain visibility of your assets. Moreover, when the network fails (and it will), mission critical data will be lost because most management systems poll data in chronological order. For instance, if your management system polls every 5 minutes and your network goes off-line for 1 hour, then 12 polling cycles are lost, even if your cloud systems are still available. Loss of such data can impact overall service level calculations, or potentially lead to false positives in your management system. To solve this problem, it may be worth looking into either store-and-forward collectors or agent hubs, depending on your management system vendor. The purpose of store-and-forward is to allow data to be collected locally (in the cloud) then relayed back to the management system. If network connectivity is lost, data will be stored until connectivity is restored.

Alternatively, many of the leading cloud providers, such as Amazon AWS and Rackspace provide built-in monitoring, allowing events and data to be forwarded or retrieved by your management system. Most of these tools are still in their infancy, except for AWS’s CloudWatch, which offers most of the same basic performance information as SNMP.

Another option is the relatively new open source agent called Collectd, which is freely available for most Unix systems and commercially available for Windows servers. Collectd works by harvesting data from your system then it pushes that data to a second receiving Collectd server. If the receiving server is also located in the same cloud environment, it will cache data for all devices even during a network outage. Although most management systems don’t natively support Collectd yet, clever systems administrators could develop a few scripts to parse Collectd data, and the pass that data over an XML web service. This would allow management systems that support XML to access the data. And if your management system also supports back-porting historical data, then Collectd might be the right option for you.

Managing cloud resources adds many new complexities to traditional monitoring and management practices. IT administrators should make sure monitoring data is secure over the public Internet, and where possible eliminate the likely possible of loss of visibility during a network outage.





