Conference item
Counterfactual multi−agent policy gradients
- Abstract:
- Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great need for new reinforcement learning methods that can ef- ficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents’ policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent’s action, while keeping the other agents’ actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actorcritic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Authors
+ Engineering and Physical Sciences Research Council
More from this funder
- Grant:
- CDTinAutonomousIntelligentMachines
- Systems
- Publisher:
- AAAI Press
- Host title:
- 32nd AAAI Conference on Artificial Intelligence (AAAI'18)
- Journal:
- 32nd AAAI Conference on Artificial Intelligence (AAAI'18) More from this journal
- Pages:
- 2974-2982
- Publication date:
- 2018-04-29
- Acceptance date:
- 2017-11-09
- ISSN:
-
2159-5399
- Keywords:
- Pubs id:
-
pubs:745007
- UUID:
-
uuid:37e732fe-a876-4699-8ee3-d556bfd235b3
- Local pid:
-
pubs:745007
- Source identifiers:
-
745007
- Deposit date:
-
2017-11-11
Terms of use
- Copyright holder:
- Association for the Advancement of Artificial Intelligence
- Copyright date:
- 2018
- Notes:
-
Copyright © 2018, Association for the Advancement of Artificial
Intelligence (www.aaai.org). This is the accepted manuscript version of the paper. The final version is available online from AAAI Press at: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17193
If you are the owner of this record, you can report an update to it here: Report update to this record