Walking the values in bayesian inverse reinforcement learning

Bajgar, O; Abate, A; Gatsis, K; Osborne, MA

AI Collection

Conference item

Walking the values in bayesian inverse reinforcement learning

Abstract:: The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem - going from rewards to the Q values - at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the computation required to go from Q-values to reward is radically cheaper. Furthermore, this reversion of the computation makes it easy to compute the gradient allowing efficient sampling using Hamiltonian Monte Carlo. We propose ValueWalk - a new Markov chain Monte Carlo method based on this insight - and illustrate its advantages on several tasks.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Bajgar, O., Abate, A., Gatsis, K., & Osborne, M. A. (2024). Walking the values in bayesian inverse reinforcement learning. 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024), 244, 273–287.

MLA Style

Bajgar, O, et al. “Walking the Values in Bayesian Inverse Reinforcement Learning.” 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024), Proceedings of Machine Learning Research, vol. 244, 2024, pp. 273–87.

Chicago Style

Bajgar, O, A Abate, K Gatsis, and MA Osborne. 2024. “Walking the Values in Bayesian Inverse Reinforcement Learning.” In 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024), 244:273–87. Proceedings of Machine Learning Research. PMLR.
Print

Access Document

Files:: Bajgar_et_al_2024_Walking_the_values.pdf

(Preview, Version of record, pdf, 3.0MB, Terms of use)

Publication website:: https://proceedings.mlr.press/v244/bajgar24a.html

Authors

+ Bajgar, O More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Oxford college:: Lady Margaret Hall
Role:: Author

+ Abate, A More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Oxford college:: Linacre College
Role:: Author
ORCID:: 0000-0002-5627-9093

+ Gatsis, K More by this author

Role:: Author

+ Osborne, MA More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Oxford college:: Exeter College
Role:: Author
ORCID:: 0000-0003-1959-012X

Publisher:: PMLR
Host title:: Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence
Volume:: 244
Pages:: 273-287
Series:: Proceedings of Machine Learning Research
Publication date:: 2024-07-15
Acceptance date:: 2024-04-15
Event title:: 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)
Event location:: Barcelona, Spain
Event website:: https://www.auai.org/uai2024/
Event start date:: 2025-07-15
Event end date:: 2025-07-19
EISSN:: 2640-3498
ISSN:: 1525-3384

Language:: English
Pubs id:: 2073248
Local pid:: pubs:2073248
Deposit date:: 2025-02-25
ARK identifier:: ark:/29072/ora_ef6e9c2d25fe49e5b238937d15749c30

Terms of use

Copyright holder:: Bajgar et al
Notes:: This paper was presented at the 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024), 14th-19th July 2024, Barcelona, Spain.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

Walking the values in bayesian inverse reinforcement learning

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

Walking the values in bayesian inverse reinforcement learning

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions