Training efficient agents for long-term decision making

Gupta, G

Thesis

Training efficient agents for long-term decision making

Abstract:: Reinforcement learning has ventured from tabletop simulators to real robots and open-world games, but today’s agents still learn with prohibitively low sample efficiency, ignore the priors encoded in foundation models, and forget most of what they have seen after a few hundred steps. This thesis pursues a unifying agenda—efficiently training efficient decision-making agents—through three successive contributions.

Chapter 1 demonstrates that sample efficiency can be substantially improved by re-weighting experience toward the transitions that are most informative. An ensemble-based uncertainty criterion selectively upsamples those rare interactions that clarify causal structure, enabling offline reinforcement learning to achieve safe, performant policies with far fewer gradient updates than uniform replay.

Stronger supervision is possible even when no new interaction data are collected, provided we can import structure learned elsewhere. Chapter 2 investigates this idea by tapping the internal representations of large generative vision models. Text-to-image diffusion backbones, although trained for synthesis rather than control, accumulate multi-scale spatial and semantic cues that are difficult to rediscover from scratch in a robotics dataset. By freezing these backbones and projecting their multi-layer activations into a control-friendly embedding—what we term Stable Control Representations (SCRs)—an agent starts with a rich inductive prior over object geometry and language grounding. In manipulation and open-vocabulary navigation tasks, SCRs cut the number of gradient steps needed to reach a given return by up to an order of magnitude and consistently outperform contrastively trained encoders, all without generating a single additional pixel. This result shows that re-using pretrained knowledge can convert computationally expensive exploration into cheap representation reuse, markedly improving sample efficiency.

While these chapters focus on learning efficiently, deployed agents must also act efficiently by leveraging context that spans hours or days. Chapter 3 introduces Memo, a transformer policy that interleaves periodic summary tokens with streaming observations so memory capacity grows gently with task length. To measure such long-term reasoning, Chapter 4 contributes FindingDory, a procedurally extendable benchmark family whose 60 tasks probe how well embodied agents store and retrieve experience.

Together, these works chart a coherent path toward agents that learn quickly, inherit rich priors, and remember what matters, moving a step closer to truly lifelong, self-improving intelligence.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Gupta, G. (2025). Training efficient agents for long-term decision making [PhD thesis]. University of Oxford.

MLA Style

Gupta, G. Training Efficient Agents for Long-Term Decision Making. 2025. University of Oxford, PhD thesis.

Chicago Style

Gupta, G. 2025. “Training Efficient Agents for Long-Term Decision Making.” PhD thesis, University of Oxford.
Print

Access Document

Files:: Gupta_2025_Training_efficient_agents.pdf

(Preview, Dissemination version, pdf, 35.8MB, Terms of use)

Authors

+ Gupta, G More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author

Contributors

+ Gal, Y

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Supervisor
ORCID:: 0000-0002-2733-2078

DOI:: 10.5287/ora-1rxpq8d7m
Type of award:: DPhil
Level of award:: Doctoral
Awarding institution:: University of Oxford

Language:: English
Keywords:: vision language models

active sampling

robot learning

reinforcement learning

visual representation learning

in-context learning
Subjects:: Machine learning

Robot vision
Deposit date:: 2026-03-07
ARK identifier:: ark:/29072/ora_a011c058ef094907a2e1c2535149a2e1

Terms of use

Copyright holder:: Gunshi Gupta

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Thesis

Training efficient agents for long-term decision making

Actions

Access Document

Authors

Contributors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Thesis

Training efficient agents for long-term decision making

Actions

Access Document

Authors

Contributors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions