Private cloud deployments: Beating the blind spots and bottlenecks
Private cloud can be challenging to manage while meeting the ramped-up demands of data-driven enterprises.
Organisations retain, or migrate to, private cloud on-premise environments for their own legacy, security or compliance reasons. They still want the benefits of cloud and cost-effective capacity to meet demand, without the risk of blind spots or bottlenecks.
But Tiago Fernandes, lead cloud solutions architect at IT distributor Tech Data, says this can be complicated, requiring detailed scrutiny of the environment and resources, before right-sizing the central processing unit (CPU) and memory allocation for virtual machines (VM).
“If a VM isn’t being used, shut it down – as long as there are no processes going behind to clean up, for example,” says Fernandes. “I don’t see why you can’t do snoozing in private cloud as well.”
Processes must be in place and routinely reviewed to pinpoint, for example, any larger VMs that are not using all their allocated resources, such as VMs running end-of-month payments or batch processing screens that are idle the rest of the time, or eating up 80% of the CPU because they are old or dealing with multiple reports or scripts. Hypervisors that can manage over-committed memory can also help redistribute memory to more idle VMs.
“Keep a close eye on all that – sometimes storage latency comes from a CPU or memory bottleneck,” says Fernandes.
Scope out, monitor and manage dormant resources, he says. Ask about projects or plans that may affect demand, including marketing campaigns and new offerings, and find out what capacity will be required in future. This means IT working much more closely with parts of the organisation than has been typical in many businesses.
Review all the metrics and look for hidden problems before investing in more capacity. Also, check and stress-test all recommendations for applications and workloads before they go live.
“Capacity management affects return on investment [ROI] and everybody in the business,” says Fernandes. “Everybody looks at other departments and says: you didn’t forecast.
However, IT needs to be connected with the business to be able to predict future demand.”
Effective capacity management
Organisations may assume on-premise private cloud deployments will deliver optimised resources in ways that enhance ROI. Yet effective capacity management entails strong reporting and ongoing governance of all resources, aligning through regular business discussions with all stakeholders, including suppliers, on lead times and processes, including consideration of times of emergency, says Fernandes.
“As per the Flexera 2021 State of the cloud report, everywhere there is a need for automatically scanning and optimising cloud costs,” he adds.
For Chris Royles, field CTO for Europe, the Middle East and Africa (EMEA) at Cloudera, right-sizing private-cloud capacity requires resource isolation, which means running a mix of workloads and carving out sets of resources specifically for certain tasks or types of problem.
Separating architecture tiers with different layers of storage and compute can add enough granularity to enables separate management and scaling, with a control plane and automation delivering resources to handle workloads, he says.
“It’s like having multiple data warehouses running against the same data collection,” says Royles. “You’ve got to have a network group to link these things together, and can then connect between your public cloud instances and your private cloud. Of course, that’s where hybrid really comes into play.”
In that separation of storage and compute within the technology stack, a tier manages the user experience types of application such as the data warehouse tools or machine learning elements, designed to meet the requirements of data scientists, he says. This tier then orchestrates the elastic experiences on top, and below that is the storage tier. This enables independent scaling at every layer.
The user experience tier scales on factors such user numbers – the storage will scale by data volumes, says Royles. The mid-tier captures the logging and telemetry – for example, monitoring workload behaviours over time. It is all about slicing and dicing the resources so that you can manage it whether it is poorly utilised or otherwise, for specific business use cases, in small bites that “smooth the curve”.
“That telemetry informs the resourcing and plan,” he says. “If you go back through everything into the storage, that can be scaled out on VMs, perhaps for 2,000 data scientists. That’s not unusual, and now that is containered, we can actually scale in smaller increments.
“The journey of how to right-size that compute infrastructure sounds counter-intuitive, but it is about using smaller machines, more often – because we need parallel throughput to storage.”