Safe and certified reinforcement learning with logical constraints

Hasanbeig, MH

Thesis

Safe and certified reinforcement learning with logical constraints

Abstract:: Reinforcement Learning (RL) is a widely employed machine learning architecture that has been applied to a variety of decision-making problems, from resource management to robot locomotion, from recommendation systems to systems biology, and from traffic control to superhuman performance in video games. However, RL has experienced limited success beyond rigidly controlled or constrained applications, and successful employment of RL in safety-critical scenarios is yet to be achieved. A principal reason for this limitation is the lack of a systematic formal approach to specify requirements as tasks (or learning constraints), and to provide guarantees with respect to these requirements and constraints, during and after learning. This work addresses these issues by proposing a general method that leverages the success of RL in learning high-performance controllers, while developing the required assumptions and theories for the satisfaction of formal requirements and guiding the learning process within safe configurations. Given the expressive power of Linear Temporal Logic (LTL) in formulating control specifications, we propose the first model-free RL scheme to synthesise policies for unknown black-box Markov Decision Processes (MDPs), where the reward function is formally shaped by an LTL specification. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Further, by exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, and we discuss the assumptions under which RL is guaranteed to synthesise control policies whose traces satisfy the LTL specification with maximum probability possible. This probability (certificate) can also be calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. We also show that our method produces "best available" control policies when the logical property cannot be satisfied. The performance of the proposed method is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Hasanbeig, M. H. (2020). Safe and certified reinforcement learning with logical constraints [PhD thesis]. University of Oxford.

MLA Style

Hasanbeig, M. H. Safe and Certified Reinforcement Learning with Logical Constraints. University of Oxford, 2020.

Chicago Style

Hasanbeig, MH. 2020. “Safe and Certified Reinforcement Learning with Logical Constraints.” PhD thesis, University of Oxford.
Share
Print

Access Document

Files:: Hasanbeig_2020_Safe_and_certified.pdf

(Preview, Dissemination version, 9.0MB, Terms of use)

Authors

+ Hasanbeig, MH More by this author

Division:: MPLS
Department:: Computer Science
Sub department:: Computer Science
Oxford college:: Oriel College
Role:: Author

Contributors

+ Abate, A

Division:: MPLS
Department:: Computer Science
Sub department:: Computer Science
Research group:: OxCAV
Oxford college:: St Hugh's College
Role:: Supervisor

+ Kroening, D

Division:: MPLS
Department:: Computer Science
Sub department:: Computer Science
Oxford college:: Magdalen College
Role:: Supervisor
ORCID:: 0000-0002-6681-5283

+ HICLASS More from this funder

Funding agency for:: Kroening, D
Grant:: 113213
Programme:: Aerospace Technology Institute (ATI), Department for Business, Energy & Industrial Strategy (BEIS) and Innovate UK

+ UK National Cyber Security Centre More from this funder

Funding agency for:: Kroening, D
Programme:: UK NCSC

Type of award:: DPhil
Level of award:: Doctoral
Awarding institution:: University of Oxford

Language:: English
Keywords:: formal methods

Markov decision processes

reinforcement learning

policy synthesis

temporal logics
Subjects:: Machine learning

Verification (Logic)

Reinforcement learning

Formal methods (Computer science)
Deposit date:: 2021-07-12

Terms of use

Copyright holder:: Hasanbeig, MH

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Thesis

Safe and certified reinforcement learning with logical constraints

Actions

Access Document

Authors

Contributors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Thesis

Safe and certified reinforcement learning with logical constraints

Actions

Access Document

Authors

Contributors

Funding

Bibliographic Details

Item Description

Related Items

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions