Conference item icon

Conference item

Conditionally optimistic exploration for cooperative deep multi-agent reinforcement learning

Abstract:
Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this work, we propose an exploration method that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent's optimism estimate captures a structured dependency relationship with other agents. Assuming agents compute actions following a sequential order at each environment timestep, we provide a perspective to view MARL as tree search iterations by considering agents as nodes at different depths of the search tree. Inspired by the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees), we develop a method called Conditionally Optimistic Exploration (COE). COE augments each agent's state-action value estimate with an action-conditioned optimistic bonus derived from the visitation count of the global state and joint actions of preceding agents. COE is performed during training and disabled at deployment, making it compatible with any value decomposition method for centralized training with decentralized execution. Experiments across various cooperative MARL benchmarks show that COE outperforms current state-of-the-art exploration methods on hard-exploration tasks.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Publication website:
https://proceedings.mlr.press/v216/zhao23b.html

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0009-0000-8297-9045


Publisher:
Journal of Machine Learning Research
Host title:
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence
Volume:
216
Pages:
2529-2540
Article number:
236
Publication date:
2023-01-01
Event title:
39th Conference on Uncertainty in Artificial Intelligence (UAI 2023)
Event location:
Pittsburgh, Pennsylvania, USA
Event website:
https://www.auai.org/uai2023/
Event start date:
2023-07-31
Event end date:
2023-08-04
EISSN:
2640-3498


Language:
English
Pubs id:
1536300
Local pid:
pubs:1536300
Deposit date:
2026-06-16
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP