Conference item icon

Conference item

Regularized Softmax Deep Multi−Agent Q−Learning

Abstract:
Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multi-agent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.
Publication status:
Published
Peer review status:
Reviewed (other)

Actions


Access Document


Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Oxford college:
St Catherine's College
Role:
Author


Publisher:
NeurIPS
Journal:
NeurIPS Proceedings 2021 More from this journal
Volume:
34
Pages:
1365-1377
Publication date:
2022-04-01
Acceptance date:
2021-11-01
Event title:
35th Annual Conference on Neural Information Processing Systems (NeurIPS 2021)


Language:
English
Keywords:
Pubs id:
1211842
Local pid:
pubs:1211842
Deposit date:
2021-11-23

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP