Conference item
Towards on-the-fly snapshot memory compression for low-latency elastic inference serving systems
- Abstract:
-
In-memory model caching and startup latency are key bottlenecks in large-scale AI serving systems, especially for GPUaccelerated large language model (LLM) inference in elastic, serverless environments. While container checkpointing enables hot starts, it introduces new challenges in memory footprint, storage bandwidth, and restore latency. Existing offline snapshot compression methods reduce snapshot size but add extra I/O, storage duplication, and decompression overhead. In this paper, we present CRIU-LZ4, a restoreoptimized method for on-the-fly compression integrated directly into the CPU–GPU checkpoint and restore pipelines. Built atop CRIUgpu, CRIU-LZ4 performs page-level compression during memory transfer, eliminating intermediate artifacts and minimizing the latency on the restore critical path. Our evaluation results show that CRIU-LZ4 reduces cold-start latency by 46–59% and achieves up to 6× smaller snapshots compared to uncompressed GPU-aware checkpointing, while eliminating the decompression bottleneck of offline compression, significantly reducing both end-to-end restore time and peak disk usage.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 694.9KB, Terms of use)
-
- Publisher copy:
- 10.1145/3805621.3807612
Authors
- Funder identifier:
- https://ror.org/0439y7842
- Grant:
- 2595601
- Publisher:
- Association for Computing Machinery
- Host title:
- EuroMLSys '26: Proceedings of the Sixth European Workshop on Machine Learning and Systems
- Pages:
- 254-262
- Publication date:
- 2026-04-27
- Acceptance date:
- 2026-03-23
- Event title:
- 6th Workshop on Machine Learning and Systems (EuroMLSys 2026)
- Event location:
- Edinburgh, UK
- Event website:
- https://euromlsys.eu/
- Event start date:
- 2026-04-27
- Event end date:
- 2026-04-27
- DOI:
- ISBN:
- 9798400726057
- Language:
-
English
- Keywords:
- Pubs id:
-
2407772
- Local pid:
-
pubs:2407772
- Deposit date:
-
2026-04-17
- ARK identifier:
Terms of use
- Copyright holder:
- Stoyanov et al.
- Copyright date:
- 2026
- Rights statement:
- Copyright © 2026 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution 4.0 International License.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record