Intelligent interaction at scale

Willi, T

Abstract:: Multi-Agent Reinforcement Learning (MARL) aims to develop intelligent agents capable of complex interaction and coordination, but faces significant challenges. Standard single-agent reinforcement learning (RL) algorithms cannot achieve stable, pro-social behaviour in general-sum games, where individual and collective interests may conflict. Simultaneously, scaling learning agents in handling numerous interacting external agents and increasing the internal complexity (parameter count) of individual deep RL agents presents significant challenges; unlike supervised learning, larger deep RL models often fail to yield improved performance.

This thesis addresses these challenges through complementary algorithmic and architectural advances. First, focusing on inter-agent dynamics, we investigate Opponent Shaping (OS). We introduce Consistent Learning with Opponent-Learning Awareness to formally resolve inconsistencies in prior OS methods that inaccurately model co-player adaptation. We further develop Shaper, scaling OS effectively to high-dimensional games with long horizons and demonstrating the benefits of shaping in more complex settings. The open-source JaxMARL library is also presented as a tool to accelerate MARL research.

Second, we explore Mixture-of-Experts (MoE) architectures within deep RL to address the challenge of internal agent complexity and scalability. We demonstrate that MoEs, particularly Soft MoEs, unlock parameter scaling in value-based deep RL. Performance increases with model size, mitigating issues like representational collapse observed in standard architectures. Evaluating MoEs under the amplified non-stationarity of multi-task and continual RL further highlights their capacity for robust learning and specialisation. To motivate future research and keep the multi-agent theme of this thesis, we draw parallels between managing internal expert coordination and external multi-agent challenges.

These contributions advance scalable MARL by providing methods to manage inter-agent influence (OS) and intra-agent complexity (MoE) by improving consistency, scalability, and architectural effectiveness.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Willi, T. (2026). Intelligent interaction at scale [PhD thesis]. University of Oxford.

MLA Style

Willi, T. Intelligent Interaction at Scale. 2026. University of Oxford, PhD thesis.

Chicago Style

Willi, T. 2026. “Intelligent Interaction at Scale.” PhD thesis, University of Oxford.
Print