Counterfactual multi−agent policy gradients

Foerster, J; Farquhar, G; Afouras, T; Nardelli, N; Whiteson, S

Conference item

Counterfactual multi−agent policy gradients

Abstract:: Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great need for new reinforcement learning methods that can ef- ficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents’ policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent’s action, while keeping the other agents’ actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actorcritic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi−agent policy gradients. 32nd AAAI Conference on Artificial Intelligence (AAAI'18), 2974–2982.

MLA Style

Foerster, J., et al. “Counterfactual Multi−Agent Policy Gradients.” 32nd AAAI Conference on Artificial Intelligence (AAAI'18), AAAI Press, 2018, pp. 2974–82.

Chicago Style

Foerster, J, G Farquhar, T Afouras, N Nardelli, and S Whiteson. 2018. “Counterfactual Multi−Agent Policy Gradients.” In 32nd AAAI Conference on Artificial Intelligence (AAAI'18), 2974–82. AAAI Press.
Share
Print

Access Document

Files:: foersteraaai18.pdf

(Preview, Accepted manuscript, pdf, 622.1KB, Terms of use)

Authors

+ Foerster, J More by this author

Role:: Author

+ Farquhar, G More by this author

Role:: Author

+ Afouras, T More by this author

Role:: Author

+ Nardelli, N More by this author

Role:: Author

+ Whiteson, S More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Oxford college:: St Catherine's College
Role:: Author

+ Engineering and Physical Sciences Research Council More from this funder

Grant:: CDTinAutonomousIntelligentMachines; Systems

+ European Research Council More from this funder

Grant:: 637713

+ Microsoft More from this funder

+ OxfordGoogle DeepMind Graduate Scholarship More from this funder

Publisher:: AAAI Press
Host title:: 32nd AAAI Conference on Artificial Intelligence (AAAI'18)
Journal:: 32nd AAAI Conference on Artificial Intelligence (AAAI'18) More from this journal
Pages:: 2974-2982
Publication date:: 2018-04-29
Acceptance date:: 2017-11-09
ISSN:: 2159-5399

Keywords:: actorcritic

deep reinforcement learning

multi-agent learning
Pubs id:: pubs:745007
UUID:: uuid:37e732fe-a876-4699-8ee3-d556bfd235b3
Local pid:: pubs:745007
Source identifiers:: 745007
Deposit date:: 2017-11-11

Terms of use

Copyright holder:: Association for the Advancement of Artificial Intelligence
Notes:: Copyright © 2018, Association for the Advancement of Artificial
Intelligence (www.aaai.org). This is the accepted manuscript version of the paper. The final version is available online from AAAI Press at: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17193

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

Counterfactual multi−agent policy gradients

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

Counterfactual multi−agent policy gradients

Actions

Access Document

Authors

Funding

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions