Thesis icon

Thesis

Safe model based policy search

Abstract:

In this dissertation we focus on safe model based policy search, a subfield of reinforcement learning with two main objectives: data efficiency and safety. To achieve data efficient learning, we use Gaussian process regression to model the dynamics of unknown non-linear systems. The flexibility and probabilistic nature of GPs, along with their useful mathematical properties that often allow for closed form calculations, facilitate building accurate models efficiently, and using these models to optimise control policies for the underlying systems. Furthermore, our safety objective, also probabilistic in nature, is formalised as predefined state space constraints. The model's predictions are used to certify the safety of a candidate policy before deploying it on the system, and we thus manage to avoid constraint violations while training. We present an open source, openly available software tool implementing our proposed algorithm for safe and data efficient policy search. Furthermore, we propose a novel method for planning over multiple time steps with Gaussian processes, and provide formal guarantees bounding the predictive uncertainty. We consider safety and data efficiency critical challenges for the wider adoption of reinforcement learning algorithms, and we hope that our contributions will be useful in this effort.

Actions


Access Document


Files:

Authors


Contributors

Role:
Supervisor
Role:
Supervisor
Role:
Examiner
Role:
Examiner


More from this funder
Funder identifier:
http://dx.doi.org/10.13039/501100000266


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Language:
English
Keywords:
Subjects:
Deposit date:
2021-06-27

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP