Conference item
Opportunistic resource reclamation in Kubernetes: from aggressive resizing to flash jobs
- Abstract:
- Modern cloud data centers suffer from chronic resource underutilization. The gap between static resource allocations and dynamic workload demand creates systemic inefficiency that current orchestration platforms fail to address adequately. In this work, we explore resource reclamation strategies in production Kubernetes clusters using emerging infrastructure-level primitives—in-place resource resizing and transparent checkpoint/restore (C/R). For CPU resources, we analyze a production workload trace, which we release publicly, and reveal significant allocation-utilization gaps. Through trace-driven simulation, we demonstrate that aggressive in-place resizing substantially increases resource utilization as well as workload evictions. We find a balanced strategy for in-place resizing and identify C/R as the missing primitive that makes aggressive resizing safe by enabling graceful termination and resumable migrations instead of progress loss. For GPU resources, where dynamic resizing is infeasible, we propose a C/R-enabled sharing strategy that allocates reserved-but-idle GPU memory to secondary workloads (flash jobs) with safety guarantees for reclamation. Our work demonstrates how the same infrastructure primitives address resource reclamation across different resource types, each with distinct technical constraints, validated through real production cluster deployments.
- Publication status:
- Accepted
- Peer review status:
- Peer reviewed
Actions
Authors
- Host title:
- Job Scheduling Strategies for Parallel Processing 2026
- Acceptance date:
- 2026-03-17
- Event title:
- 29th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2026)
- Event location:
- New Orleans, USA
- Event website:
- https://jsspp.org/
- Event start date:
- 2026-05-25
- Event end date:
- 2026-05-29
- Language:
-
English
- Keywords:
- Pubs id:
-
2415573
- Local pid:
-
pubs:2415573
- Deposit date:
-
2026-05-06
- ARK identifier:
If you are the owner of this record, you can report an update to it here: Report update to this record