Conditionally optimistic exploration for cooperative deep multi-agent reinforcement learning

Zhao, X; Pan, Y; Xiao, C; Chandar, S; Rajendran, J

AI Collection

Conference item

Conditionally optimistic exploration for cooperative deep multi-agent reinforcement learning

Abstract:: Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this work, we propose an exploration method that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent's optimism estimate captures a structured dependency relationship with other agents. Assuming agents compute actions following a sequential order at each environment timestep, we provide a perspective to view MARL as tree search iterations by considering agents as nodes at different depths of the search tree. Inspired by the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees), we develop a method called Conditionally Optimistic Exploration (COE). COE augments each agent's state-action value estimate with an action-conditioned optimistic bonus derived from the visitation count of the global state and joint actions of preceding agents. COE is performed during training and disabled at deployment, making it compatible with any value decomposition method for centralized training with decentralized execution. Experiments across various cooperative MARL benchmarks show that COE outperforms current state-of-the-art exploration methods on hard-exploration tasks.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Zhao, X., Pan, Y., Xiao, C., Chandar, S., & Rajendran, J. (2023). Conditionally optimistic exploration for cooperative deep multi-agent reinforcement learning. 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023), 216, 2529–2540.

MLA Style

Zhao, X, et al. “Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning.” 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023), vol. 216, 2023, pp. 2529–40.

Chicago Style

Zhao, X, Y Pan, C Xiao, S Chandar, and J Rajendran. 2023. “Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning.” In 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023), 216:2529–40. Journal of Machine Learning Research.
Print

Access Document

Files:: Zhao_et_al_2023_Conditionally_optimistic_exploration.pdf

(Preview, Version of record, pdf, 473.1KB, Terms of use)

Publication website:: https://proceedings.mlr.press/v216/zhao23b.html

Authors

+ Zhao, X More by this author

Role:: Author

+ Pan, Y More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author
ORCID:: 0009-0000-8297-9045

+ Xiao, C More by this author

Role:: Author

+ Chandar, S More by this author

Role:: Author

+ Rajendran, J More by this author

Role:: Author

Publisher:: Journal of Machine Learning Research
Host title:: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence
Volume:: 216
Pages:: 2529-2540
Article number:: 236
Publication date:: 2023-01-01
Event title:: 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023)
Event location:: Pittsburgh, Pennsylvania, USA
Event website:: https://www.auai.org/uai2023/
Event start date:: 2023-07-31
Event end date:: 2023-08-04
EISSN:: 2640-3498

Language:: English
Pubs id:: 1536300
Local pid:: pubs:1536300
Deposit date:: 2026-06-16
ARK identifier:: ark:/29072/ora_2b50e2e7f75642a99eda86dad6c58787

Terms of use

Copyright holder:: Zhao et al.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

Conditionally optimistic exploration for cooperative deep multi-agent reinforcement learning

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

Conditionally optimistic exploration for cooperative deep multi-agent reinforcement learning

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions