Thesis
Towards data-efficient deployment of reinforcement learning systems
- Abstract:
 - 
		
			
A fundamental concern in the deployment of artificial agents in real-life is their capacity to quickly adapt to their surroundings. Traditional reinforcement learning (RL) struggles with this requirement in two ways. Firstly, iterative exploration of unconstrained environment dynamics yields numerous uninformative updates and consequently slow adaptation. Secondly, final policies have no capacity to adapt to future observations and have to either slowly learn indefinitely or retrain entirely as observations occur.
This thesis explores two formulations aimed at addressing these issues. The consideration of entire task distributions in meta-RL evolves policies quickly adapting to specific instances on their own. By forcing agents to specifically request feedback, Active RL enforces selective observations and updates. Both of these formulations reduce to a Bayes-Adaptive setting in which a probabilistic belief over possible environments is maintained. Many existing solutions only provide asymptotic guarantees that are of limited use in practical contexts. We develop a variational approach to approximate belief management and support its validity empirically through a broad range of ablations. We then consider recently successful planning approaches but uncover and discuss obstacles in their application to the discussed settings.
An important factor influencing the data requirements and stability of RL systems is the choice of appropriate hyperparameters. We develop a Bayesian optimisation approach exploiting the iterative structure of training processes whose empiric performance exceeds that of existing baselines.
A final contribution of this thesis concerns increasing the scalability and expressiveness of Gaussian Processes (GPs). While we make no direct use of the presented framework, GPs have been used to model probabilistic beliefs in closely related settings.
 
Actions
Authors
Contributors
- Institution:
 - University of Oxford
 - Division:
 - MPLS
 - Department:
 - Engineering Science
 - Role:
 - Supervisor
 - ORCID:
 - 0000-0003-1959-012X
 
- Institution:
 - University of Oxford
 - Division:
 - MPLS
 - Department:
 - Computer Science
 - Role:
 - Supervisor
 
- Role:
 - Examiner
 
- Role:
 - Examiner
 - ORCID:
 - 0000-0002-9003-6642
 
- Funder identifier:
 - http://dx.doi.org/10.13039/501100000266
 - Funding agency for:
 - Schulze, S
 - Grant:
 - 1802029
 - Programme:
 - ICASE studentship
 
- Funding agency for:
 - Schulze, S
 - Grant:
 - 1802029
 - Programme:
 - ICASE studentship
 
- DOI:
 - Type of award:
 - DPhil
 - Level of award:
 - Doctoral
 - Awarding institution:
 - University of Oxford
 
- Language:
 - 
                    English
 - Subjects:
 - Deposit date:
 - 
                    2022-10-08
 
Terms of use
- Copyright holder:
 - Schulze, S
 - Copyright date:
 - 2021
 
If you are the owner of this record, you can report an update to it here: Report update to this record