Essential Data Center Environmental Monitoring

The power outages that raged across the Mid-Atlantic and Washington DC area last week reminded me of the importance of maintaining proper data center environmental monitoring. Although often viewed as an afterthought, environmental monitoring can actually help identify problems before they become serious issues.

Most modern data centers do provide redundancy in the way of power and cooling; however, that does not guarantee there won’t be any problems. Key to remember is that data center equipment has a particularly high amount of electromechanical componentry, which tends to fail at higher rates than solid-state equipment. Therefore, having continuous visibility into your data center, whether day-to-day or during a power-outage, will help you and your team improve disaster readiness while reducing response times.

For instance, if the power grid goes off-line and your backup generator kicks-on, knowing when your generator started and how much diesel fuel is remaining will help you proactively plan refueling activities. The same goes for Uninterruptable Power Supplies (UPS), whether your data center is on a backup generator or not, knowing the health of your UPS systems is vital toward managing a crisis. UPS systems are essential in maintaining power until the generator comes online, of course if the generator fails to engage, then knowing how much life is left in your UPS batteries is essential so you can plan proper shutdown of your environment. Equally, when the UPS takes over because of power-loss, its batteries take on a higher load and can quickly overheat. This is called “thermal runaway”, and the problem could be compounded if your data center cooling is impacted until backup generator or grid power is restored.

Data center cooling is another area that should be monitored, as some CRAC systems may be on building power or a different generator, alternatively condensation pumps and drains my not operate as intended, and could cause your exchanger to freeze. Monitoring the temperature, humidity, and areas of water leakage, whether in a crisis or not, allows your IT team to respond before small problems become serious issues.

The reality is that adding environmental monitoring to your data center is not all that expensive, and the upside is quite obvious. There are several products on the market that monitor everything from temperature, to power-levels, to humidity, to air-flow, to fuel levels. Likewise, most UPS vendors provide optional management cards that provide UPS status and health data. Over the years we’ve used a myriad of different products; however, AKCP’s SecurityProbe and NetBotz (now owned by APC) stand out as the most versatile product we’ve tested. Add-on probes or modules can be added to allow monitoring of just about any environmental concern.

Integrating these types of environmental products into your management system is best done using SNMP because of its tried-and-true reliability, easy-of-configuration, and wide support from management products. Where possible IT Managers should avoid proprietary-only element manager solutions because of the limitations found when integrating into your company’s IT workflow process. Collecting environmental data into your central monitoring and management system yields substantial benefits in data analysis, historical reporting, event handling, notification, and multi-tenant access to non-administrative users. For instance, with ScienceLogic EM7 management system, administrators can easily setup different events and notification policies against the same data collection points (e.g. generator fuel level), which helps inform individuals who need to know, when they need to know. The same can be applied to monitoring of your UPS system, CRAC and cooling systems, and even fire suppression system.

Maintaining visibility of your data center environment is absolutely vital, especially during a power-outage or crisis. Key is making sue each system is monitored as best as possible, this includes: UPS and batteries, generator and fuel levels, CRAC and temperature/humidity. With proper implementation into your management system, IT can gain clear visibility into each system and react as needed.


Comments are closed.