Conference item icon

Conference item

Engine-agnostic model hot-swapping for cost-effective LLM inference

Abstract:

The widespread adoption of Large Language Models (LLMs) has led to an increased demand for large-scale inference services, presenting a unique set of challenges for the HPC community. These services are characterized by moderate-scale models that require dedicating expensive GPUs to handle bursty inference requests, leading to high costs and resource underutilization. In this paper, we propose SwapServeLLM — a novel engine-agnostic hot-swapping method for cost-effective inference. This model hot-swapping approach is enabled by recent driver capabilities for transparent GPU checkpointing. SwapServeLLM optimizes resource utilization by dynamically allocating GPU resources with two key mechanisms: (1) a demand-aware preemption leveraging information about concurrent requests, and (2) efficient request routing with memory reservation minimizing inference latency. Our evaluation demonstrates that SwapServeLLM optimizes model loading for state-ofthe-art inference engines by 31× compared to vLLM and up to 29% compared to Ollama, enabling cost-effective inference.

Publication status:
Accepted
Peer review status:
Peer reviewed

Actions


Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Oxford college:
Somerville College
Role:
Author
ORCID:
0000-0001-9688-2615
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author


Publisher:
Association for Computing Machinery
Acceptance date:
2025-09-05
Event title:
7th International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC
Event location:
St. Louis, Missouri, USA
Event website:
https://sc25.supercomputing.org/
Event start date:
2025-11-16
Event end date:
2025-11-21


Language:
English
Keywords:
Pubs id:
2292837
Local pid:
pubs:2292837
Deposit date:
2025-09-25

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP