Conference item icon

Conference item

Expected policy gradients

Abstract:

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates across the action when estimating the gradient, instead of relying only on the action in the sampled trajectory. We establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are special cases. We also prove that EPG reduces the variance ...

Expand abstract
Publication status:
Published
Peer review status:
Peer reviewed
Version:
Accepted manuscript

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
ComputerScience
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
ComputerScience
Oxford college:
St Catherines College
Role:
Author
Publisher:
Association for the Advancement of Artificial Intelligence Publisher's website
Pages:
2868-2875
Publication date:
2018-04-29
Acceptance date:
2017-11-09
ISSN:
2374-3468
Pubs id:
pubs:744767
URN:
uri:360b3ba7-3ec6-4f37-a662-9ded24fa099f
UUID:
uuid:360b3ba7-3ec6-4f37-a662-9ded24fa099f
Local pid:
pubs:744767
ISBN:
978-1-57735-800-8

Terms of use


Metrics


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP