Conference item
Regularized Softmax Deep Multi−Agent Q−Learning
- Abstract:
- Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multi-agent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.
- Publication status:
- Published
- Peer review status:
- Reviewed (other)
Actions
Authors
- Publisher:
- NeurIPS
- Journal:
- NeurIPS Proceedings 2021 More from this journal
- Volume:
- 34
- Pages:
- 1365-1377
- Publication date:
- 2022-04-01
- Acceptance date:
- 2021-11-01
- Event title:
- 35th Annual Conference on Neural Information Processing Systems (NeurIPS 2021)
- Language:
-
English
- Keywords:
- Pubs id:
-
1211842
- Local pid:
-
pubs:1211842
- Deposit date:
-
2021-11-23
Terms of use
- Copyright holder:
- Pan et al.
- Copyright date:
- 2022
- Rights statement:
- Copyright © 2022 The Author(s).
- Notes:
-
This is the accepted manuscript version of the article. The final version is available from NeurIPS Proceedings at https://proceedings.neurips.cc/paper/2021/hash/0a113ef6b61820daa5611c870ed8d5ee-Abstract.html
If you are the owner of this record, you can report an update to it here: Report update to this record