Thesis icon

Thesis

Deep Reinforcement Learning in complex environments

Abstract:
Deep Reinforcement Learning (DRL), is becoming a popular and mature framework for learning to solve sequential decision making problems. The application of Deep Neural Networks, flexible and powerful function approximators, towards learning policies has effectively enabled RL to solve applications that were thought to be too difficult: from beating professional human players in hard games such as Go, to becoming the foundation for flexible embodied control. We explore what happens when one attempts to learn policies in environments that present complex dynamics and hard and structured tasks. As these environments provide challenges that lie fundamentally at the forefront what most state-of-the-art Reinforcement Learning methods try to tackle, they provide a general view of existing weaknesses, while also providing opportunities for improving the general framework as well as particular algorithms. Firstly, we study and develop methods for Deep Multi-Agent Reinforcement Learning, a setting in which multiple agents are interacting with an (often complex) environment and each other. The presence of multiple agents breaks some of the key assumptions that provide necessary stability to standard learning methods, creating unique and interesting problems. We test these methods by formulating a multi-agent version of the StarCraft micromanagement problem, an extremely complex real-time control and planning problem based on one of the hardest environments currently available in the literature. Secondly, in a single-agent version of the same problem, we investigate how DRL can be used to develop a set of parameter-efficient differentiable planning modules to solve path-planning tasks with complex environment dynamics and variable map sizes. We show that the modules enable learning to plan when the environment also includes stochastic elements, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive, hard navigation problems. Thirdly, and lastly, we present a novel RL benchmark based on one of the oldest and most complex video games ever developed: the NetHack Learning Environment (NLE). NLE provides an environment that is scalable, rich, and challenging for state-of-the-art RL, while maintaining familiarity with standard grid-worlds, and dramatically decreasing the computational requirements compared to existing environments of similar complexity and scope. We believe that this particular intersection of properties will enable the community to employ a single environment both as a debugging tool for increasingly complicated RL agents, and as a target for the next decade of RL research.

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Sub department:
Engineering Science
Research group:
Torr Vision Group
Oxford college:
St Catherine's College
Role:
Author
ORCID:
0000-0001-8491-8166

Contributors

Role:
Supervisor


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP