Thesis
Intelligent interaction at scale
- Abstract:
-
Multi-Agent Reinforcement Learning (MARL) aims to develop intelligent agents capable of complex interaction and coordination, but faces significant challenges. Standard single-agent reinforcement learning (RL) algorithms cannot achieve stable, pro-social behaviour in general-sum games, where individual and collective interests may conflict. Simultaneously, scaling learning agents in handling numerous interacting external agents and increasing the internal complexity (parameter count) of individual deep RL agents presents significant challenges; unlike supervised learning, larger deep RL models often fail to yield improved performance.
This thesis addresses these challenges through complementary algorithmic and architectural advances. First, focusing on inter-agent dynamics, we investigate Opponent Shaping (OS). We introduce Consistent Learning with Opponent-Learning Awareness to formally resolve inconsistencies in prior OS methods that inaccurately model co-player adaptation. We further develop Shaper, scaling OS effectively to high-dimensional games with long horizons and demonstrating the benefits of shaping in more complex settings. The open-source JaxMARL library is also presented as a tool to accelerate MARL research.
Second, we explore Mixture-of-Experts (MoE) architectures within deep RL to address the challenge of internal agent complexity and scalability. We demonstrate that MoEs, particularly Soft MoEs, unlock parameter scaling in value-based deep RL. Performance increases with model size, mitigating issues like representational collapse observed in standard architectures. Evaluating MoEs under the amplified non-stationarity of multi-task and continual RL further highlights their capacity for robust learning and specialisation. To motivate future research and keep the multi-agent theme of this thesis, we draw parallels between managing internal expert coordination and external multi-agent challenges.
These contributions advance scalable MARL by providing methods to manage inter-agent influence (OS) and intra-agent complexity (MoE) by improving consistency, scalability, and architectural effectiveness.
Actions
Access Document
- Files:
-
-
(Supplementary materials, zip, 31.8MB, Terms of use)
-
(Preview, Dissemination version, pdf, 32.1MB, Terms of use)
-
Authors
Contributors
+ Foerster, J
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Engineering Science
- Role:
- Supervisor
+ Steel, H
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Engineering Science
- Sub department:
- Engineering Science
- Role:
- Examiner
+ Raileanu, R
- Role:
- Examiner
+ University of Oxford – Department of Engineering Science
More from this funder
- Programme:
- Departmental Research Studentship (DPhil)
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2026-03-08
- ARK identifier:
Terms of use
- Copyright holder:
- Timon Willi
- Copyright date:
- 2026
- Notes:
- JaxMARL: multi-agent RL environments and algorithms in JAX is derived from this thesis.
If you are the owner of this record, you can report an update to it here: Report update to this record