The Uptime Institute estimated as far back as 2015 that idle servers could be wasting around 30% of their consumed energy, with improvements fuelled by trends such as virtualisation largely plateaued.
According to Uptime, the proportion of power consumed by “functionally dead” servers in the datacentre looks to be creeping up again, which is not what operators want to hear as they struggle to contain costs and target sustainability.
Todd Traver, vice-president for digital resiliency at the Uptime Institute, confirms that the issue is worthy of attention. “The analysis of idle power consumption will drive focus on the IT planning and processes around application design, procurement and the business processes that enabled the server to be installed in the datacentre in the first place,” Traver tells ComputerWeekly.
Yet higher performance multi-core servers, requiring higher idle power in the range of 20W or more than lower-power servers, can deliver performance improvements of over 200% versus lower-powered servers, he notes. If a datacentre was myopically focused on reducing power consumed by servers, that would drive the wrong buying behaviour.
“This could actually increase overall power consumption since it would significantly sub-optimise the amount of workload processed per watt consumed,” warns Traver.
So, what should be done?
Datacentre operators can play a role in helping to reduce idle power by, for instance, ensuring the hardware provides performance based on the service-level objectives (SLO) required by the application they must support. “Some IT shops tend to over-purchase server performance, ‘Just in case’,” adds Traver.
He notes that resistance from IT teams worried about application performance can be encountered, but careful planning should ensure many applications easily withstand properly implemented hardware power management, without affecting end user or SLO targets.
Start by sizing server components and capabilities for the workload and understanding the application and its requirements alongside throughput, response time, memory use, cache, and so on. Then ensure hardware C-state power management functions are turned on and used, says Traver.
Stage three is continuous monitoring and increasing of server utilisation, with software available to help balance workload across servers, he adds.
Sascha Giese, head geek at infrastructure management provider SolarWinds, agrees: “With orchestration software which is in use in in bigger datacentres, we would actually be able to dynamically shut down machines that are no use right now. That can help quite a lot.”
Improving the machines themselves and changing mindsets remains important – shifting away from an over-emphasis on high performance. Shutting things down might also extend hardware lifetimes.
Giese says that even with technological improvements happening at server level and increased densities, broader considerations remain that go beyond agility. It’s all one part of a larger puzzle, which might not offer a perfect solution, he says.
New thinking might address how energy consumption and utilisation are measured and interpreted, which can be different within different organisations and even budgeted for differently.
“Obviously, it is in the interest of administrators to provide a lot of resources. That’s a big problem because they might not consider the ongoing costs, which is basically what you’re after in the big picture,” says Giese.
Designing power-saving schemes
Simon Riggs, PostgreSQL fellow at managed database provider EDB, has worked frequently on power consumption codes as a developer. When implementing power reduction techniques in software, including PostgreSQL, the team starts by analysing the software with Linux PowerTop to see which parts of the system wake up when idle. Then they look at the code to learn which wait loops are active.
A typical design pattern for normal operation might be waking when requests for work arrive or every two to five seconds to recheck status. After 50 idle loops, the pattern might be to move from normal to hibernate mode but move straight back to normal mode when woken for work.
The team reduces power consumption by extending wait loop timeouts to 60 seconds, which Riggs says gives a good balance between responsiveness and power consumption.
“This scheme is fairly easy to implement, and we encourage all software authors to follow these techniques to reduce server power consumption,” Riggs adds. “Although it seems obvious, adding a ‘low power mode’ isn’t high on the priority list for many businesses.”
Progress can and should be reviewed regularly, he points out – adding that he has spotted a few more areas that the EDB team can clean up when it comes to power consumption coding while maintaining responsiveness of the application.
“Probably everybody thinks that it’s somebody else’s job to tackle these things. Yet, perhaps 50-75% of servers out there are not used much,” he says. “In a business such as a bank with 5,000-10,000 databases, quite a lot of those don’t do that much. A lot of those databases are 1GB or less and might only have a few transactions per day.”
Jonathan Bridges is chief innovation officer at cloud provider Exponential-e, which has a presence in 34 UK datacentres. He says that cutting back on powering inactive servers is crucial to datacentres looking to become more sustainable and make savings, with so many workloads – including cloud environments – idle for large chunks of time, and scale-out has often not been architected effectively.
“We’re finding a lot of ghost VMs [virtual machines],” Bridges says. “We see people trying to put in software technology so cloud management platforms typically federate those multiple environments.”
Persistent monitoring may reveal underutilised workloads and other gaps which can be targeted with automation and business process logic to enable switch off or at least a more strategic business choice around the IT spend.
However, what typically happens especially with the prevalence of shadow IT is that IT departments don’t actually know what’s happening. Also, these problems can become more prevalent as organisations grow, spread and disperse globally and manage multiple off-the-shelf systems that weren’t originally designed to work together, Bridges notes.