Conference item icon

Conference item

Expected policy gradients

Abstract:

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates across the action when estimating the gradient, instead of relying only on the action in the sampled trajectory. We establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are special cases. We also prove that EPG reduces the variance ...

Expand abstract
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
ComputerScience
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Oxford college:
St Catherine's College
Role:
Author
Publisher:
Association for the Advancement of Artificial Intelligence Publisher's website
Journal:
AAAI 2018: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence Journal website
Pages:
2868-2875
Host title:
AAAI 2018: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, February 2–7, 2018 at the Hilton New Orleans Riverside, New Orleans, Lousiana, USA
Publication date:
2018-04-29
Acceptance date:
2017-11-09
ISSN:
2374-3468
Source identifiers:
744767
ISBN:
9781577358008
Pubs id:
pubs:744767
UUID:
uuid:360b3ba7-3ec6-4f37-a662-9ded24fa099f
Local pid:
pubs:744767
Deposit date:
2017-11-23

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP