Thesis icon

Thesis

Towards data-efficient deployment of reinforcement learning systems

Abstract:

A fundamental concern in the deployment of artificial agents in real-life is their capacity to quickly adapt to their surroundings. Traditional reinforcement learning (RL) struggles with this requirement in two ways. Firstly, iterative exploration of unconstrained environment dynamics yields numerous uninformative updates and consequently slow adaptation. Secondly, final policies have no capacity to adapt to future observations and have to either slowly learn indefinitely or retrain entirely as observations occur.

This thesis explores two formulations aimed at addressing these issues. The consideration of entire task distributions in meta-RL evolves policies quickly adapting to specific instances on their own. By forcing agents to specifically request feedback, Active RL enforces selective observations and updates. Both of these formulations reduce to a Bayes-Adaptive setting in which a probabilistic belief over possible environments is maintained. Many existing solutions only provide asymptotic guarantees that are of limited use in practical contexts. We develop a variational approach to approximate belief management and support its validity empirically through a broad range of ablations. We then consider recently successful planning approaches but uncover and discuss obstacles in their application to the discussed settings.

An important factor influencing the data requirements and stability of RL systems is the choice of appropriate hyperparameters. We develop a Bayesian optimisation approach exploiting the iterative structure of training processes whose empiric performance exceeds that of existing baselines.

A final contribution of this thesis concerns increasing the scalability and expressiveness of Gaussian Processes (GPs). While we make no direct use of the presented framework, GPs have been used to model probabilistic beliefs in closely related settings.

Actions


Access Document


Files:

Authors


More by this author
Division:
MPLS
Department:
Engineering Science
Role:
Author

Contributors

Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Supervisor
ORCID:
0000-0003-1959-012X
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Supervisor
Role:
Examiner
Role:
Examiner
ORCID:
0000-0002-9003-6642


More from this funder
Funder identifier:
http://dx.doi.org/10.13039/501100000266
Funding agency for:
Schulze, S
Grant:
1802029
Programme:
ICASE studentship
More from this funder
Funding agency for:
Schulze, S
Grant:
1802029
Programme:
ICASE studentship


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Language:
English
Subjects:
Deposit date:
2022-10-08

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP