Conference item

Approximate policy iteration for Markov decision processes via quantitative adaptive aggregations

Abstract:: We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Abate, A., Češka, M., & Kwiatkowska, M. (2016). Approximate policy iteration for Markov decision processes via quantitative adaptive aggregations. Lecture Notes in Computer Science, 9938, 13–31.

MLA Style

Abate, A., et al. “Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations.” Lecture Notes in Computer Science, vol. 9938, Springer Verlag, 2016, pp. 13–31.

Chicago Style

Abate, A, M Češka, and M Kwiatkowska. 2016. “Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations.” In Lecture Notes in Computer Science, 9938:13–31. Automated Technology for Verification and Analysis. ATVA 2016. Springer Verlag.
Share
Print

Access Document

Publisher copy:: 10.1007/978-3-319-46520-3_2

Authors

+ Abate, A More by this author

Institution:: University of Oxford
Division:: MPLS Division
Department:: Computer Science
Oxford college:: Linacre College
Role:: Author

+ Češka, M More by this author

Role:: Author

+ Kwiatkowska, M More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Oxford college:: Trinity College
Role:: Author

+ Engineering & Physical Sciences Research Council More from this funder

Grant:: EP/M019918/1

Publisher:: Springer Verlag
Host title:: Lecture Notes in Computer Science
Journal:: Lecture Notes in Computer Science More from this journal
Volume:: 9938
Pages:: 13-31
Series:: Automated Technology for Verification and Analysis. ATVA 2016
Publication date:: 2016-09-22
DOI:: 10.1007/978-3-319-46520-3_2
ISSN:: 0302-9743 and 1611-3349
ISBN:: 9783319465197

Keywords:: FFR
Pubs id:: pubs:657801
UUID:: uuid:767bea47-8fc9-4a82-9926-e4530511c3ae
Local pid:: pubs:657801
Source identifiers:: 657801
Deposit date:: 2019-11-12

Terms of use

Copyright date:: 2016

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP