Efficient and scalable methods for deep reinforcement learning

Farquhar, G

Thesis

Efficient and scalable methods for deep reinforcement learning

Abstract:: This thesis proposes some new answers to an old question - how can artificially intelligent agents efficiently learn from their experiences to make optimal decisions? We adopt the framework of reinforcement learning (RL), in which agents are trained to maximise their expected long-term cumulative rewards, and build on the recent successes of deep RL by using deep neural network function approximators for policies, value functions, and other model components. Deep RL often learns very inefficiently from experience, and can struggle to scale to very complex problems with large action spaces or sparse feedback. We address these challenges in several ways.

In Part I, we dive deep into a subfield of RL concerning multiple agents that must cooperate to achieve a common goal. These multi-agent systems test the limits of our algorithms due to their complex dynamics, large joint action spaces, and decentralisation constraints. We develop methods to address partial observability and multi-agent credit assignment (Chapter 3), nonstationarity induced by co-learning agents (Chapter 4), and efficient representation and learning of joint action values (Chapter 5).

In Part II, we leave the specific setting of multi-agent RL to build more general inductive biases into algorithms and architectures. In Chapter 6 we leverage the inductive bias that tree-search planning is an effective representation of value functions or policies to accelerate learning. In Chapter 7 we use a curriculum of progressively growing action spaces to enable efficient exploration without compromising long-term optimality.

In Part III, we focus on estimators of higher-order derivatives in the context of RL. Among other applications, these estimators can be used in meta-learning, where we attempt to learn algorithms or inductive biases from data rather than hand-designing them as in Parts I and II. In Chapter 8 we propose an objective which may be (automatically) differentiated any number of times to produce unbiased estimates of higher-order derivatives. In Chapter 9, we extend this objective to reduce its variance, as well as allowing a flexible trade-off of bias and variance in estimators of any-order derivatives for RL.

Together, these contributions make valuable strides towards realising efficient and scalable solutions to challenging RL problems, as well as opening up exciting directions for future work building on the algorithms, architectures, and estimators proposed here.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Farquhar, G. (2020). Efficient and scalable methods for deep reinforcement learning [PhD thesis]. University of Oxford.

MLA Style

Farquhar, G. Efficient and Scalable Methods for Deep Reinforcement Learning. University of Oxford, 2020.

Chicago Style

Farquhar, G. 2020. “Efficient and Scalable Methods for Deep Reinforcement Learning.” PhD thesis, University of Oxford.
Share
Print