Artificial intelligence is set to play a bigger role in data-center operations as enterprises begin to adopt machine-learning technologies that have been tried and tested by larger data-center operators and colocation providers.
Today’s hybrid computing environments often span on-premise data centers, cloud and collocation sites, and edge computing deployments. And enterprises are finding that a traditional approach to managing data centers isn’t optimal. By using artificial intelligence, as played out through machine learning, there’s enormous potential to streamline the management of complex computing facilities.
AI in the data center, for now, revolves around using machine learning to monitor and automate the management of facility components such as power and power-distribution elements, cooling infrastructure, rack systems and physical security.
Inside data-center facilities, there are increasing numbers of sensors that are collecting data from devices including power back-up (UPS), power distribution units, switchgear and chillers. Data about these devices and their environment is parsed by machine-learning algorithms, which cull insights about performance and capacity, for example, and determine appropriate responses, such as changing a setting or sending an alert. As conditions change, a machine-learning system learns from the changes – it’s essentially trained to self-adjust rather than rely on specific programming instructions to perform its tasks.
The goal is to enable data-center operators to increase the reliability and efficiency of the facilities and, potentially, run them more autonomously. However, getting the data isn’t a trivial task.
A baseline requirement is real-time data from major components, says Steve Carlini, senior director of data-center global solutions at Schneider Electric. That means chillers, cooling towers, air handlers, fans and more. On the IT equipment side, it means metrics such as server utilization rate, temperature and power consumption.
“Metering a data center is not an easy thing,” Carlini says. “There are tons of connection points for power and cooling in data centers that you need to get data from if you want to try to do AI.”
IT pros are accustomed to device monitoring and real-time alerting, but that’s not the case on the facilities side of the house. “The expectation of notification in IT equipment is immediate. On your power systems, it’s not immediate,” Carlini says. “It’s a different world.”
It’s only within the last decade or so that the first data centers were fully instrumented, with meters to monitor power and cooling. And where metering exists, standardization is elusive: Data-center operators rely on building-management systems that utilize multiple communication protocols – from Modbus and BACnet to LONworks and Niagara – and have had to be content with devices that don’t share data or can’t be operated via remote control. “TCP/IP, Ethernet connections – those kinds of connections were unheard of on the powertrain side and cooling side,” Carlini says.
The good news is that data-center monitoring is advancing toward the depth that’s required for advanced analytics and machine learning. “The service providers and colocation providers have always been pretty good at monitoring at the cage level or the rack level, and monitoring energy usage. Enterprises are starting to deploy it, depending on the size of the data center,” Carlini says.
Machine learning keeps data centers cool
A Delta Airlines data center outage, attributed to electrical-system failure, grounded about 2,000 flights over a three-day period in 2016 and cost the airline a reported $150 million. That’s exactly the sort of scenario that machine learning-based automation could potentially avert. Thanks to advances in data center metering and the advent of data pools in the cloud, smart systems have the potential to spot vulnerabilities and drive efficiencies in data-center operations in ways that manual processes can’t.
A simple example of machine learning-driven intelligence is condition-based maintenance that’s applied to consumable items in a data center, for example, cooling filters. By monitoring the air flow through multiple filters, a smart system could sense if some of the filters are more clogged than others, and then direct the air to the less clogged units until it’s time to change all the filters, Carlini says.
Another example is monitoring the temperature and discharge of the batteries in UPS systems. A smart system can identify a UPS system that’s been running in a hotter environment and might have been discharged more often than others, and then designate it as a backup UPS rather than a primary. “It does a little bit of thinking for you. It’s something that could be done manually, but the machines can also do it. That’s the basic stuff,” Carlini says.
Taking things up a level is dynamic cooling optimization, which is one of the more common examples of machine learning in the data center today, particularly among larger data-center operators and colocation providers.
With dynamic cooling optimization, data center managers can monitor and control a facility’s cooling infrastructure based on environmental conditions. When equipment is moved or computing traffic spikes, heat loads in the building can change, too. Dynamically adjusting cooling output to shifting heat loads can help eliminate unnecessary cooling capacity and reduce operating costs.
Colocation providers are big adopters of dynamic cooling optimization, says Rhonda Ascierto, research director for the datacenter technologies and eco-efficient IT channel at 451 Research. “Machine learning isn’t new to the data center,” Ascierto says. “Folks for a long time have tried to better right-size cooling based on capacity and demand, and machine learning enables you to do that in real time.”
Vigilent is a leader in dynamic cooling optimization. Its technology works to optimize the airflow in a data center facility, automatically finding and eliminating hot spots.
Data center operators tend to run much more cooling equipment than they need to, says Cliff Federspiel, founder, president and CTO of Vigilent. “It usually produces a semi-acceptable temperature distribution, but at a really high cost.”
If there’s a hot spot, the typical reaction is to add more cooling capacity. In reality, higher air velocity can produce pressure differences, interfering