A Proactive Virtual Machine Consolidation Framework Based on Multi-Dimensional Workload Awareness and Deep Reinforcement Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In contemporary cloud data centers, striking an effective balance between minimizing energy consumption and fulfilling service level agreements (SLAs) presents a critical challenge that impacts system sustainability. Although virtual machine (VM) consolidation has been widely implemented to improve resource utilization, most existing methods overlook multi-dimensional resource constraints, such as disk I/O, or depend on reactive, delayed response mechanisms. This often results in performance variability and resource imbalances when managing non-steady-state workloads and heterogeneous environments. To address these issues, this study introduces a framework called Dynamic Threshold Control and Three-Dimensional Resource Coordination Optimization Framework (DTCF) for VM consolidation. The framework employs a hybrid model, Wavelet-TCN-LSTM, through the 3D-PADT (Three-Dimensional Predictive Adaptive Dynamic Threshold) mechanism. This model captures the spatiotemporal correlation features of workloads and dynamically adjusts overload thresholds to proactively prevent overloads. Additionally, the DMRCIW (Dynamic Multi-Resource Coupling Impact Weight) policy takes into account historical volatility and the interdependence of resources among VMs to identify and reallocate high-risk workloads, thus enhancing system stability. Lastly, the NAP-DRL (Noise-Aware Physics-Constrained Deep Reinforcement Learning) placement algorithm optimizes resource scheduling by using action masking and a physically-aware reward structure, which helps to fully exploit heterogeneous hardware capabilities while adhering to strict resource constraints. Experimental results from the Google Cluster Trace indicate that, compared to existing mainstream methods, the DTCF significantly reduces energy consumption and lowers SLA violation rates by 94.4%, thereby effectively achieving a synergistic optimization of system energy efficiency and operational stability.

Article activity feed