Partial identifiability and misspecification in inverse reinforcement learning

Skalse, J

Thesis

Partial identifiability and misspecification in inverse reinforcement learning

Abstract:: The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function R from a policy π. This problem is difficult, for several reasons. First of all, there are typically multiple reward functions which are compatible with a given policy; this means that the reward function is only partially identifiable, and that IRL contains a certain fundamental degree of ambiguity. Secondly, in order to infer R from π, an IRL algorithm must have a behavioural model that describes how π relates to R. However, the true relationship between human preferences and human behaviour is very complex, and practically impossible to fully capture with a simple model. This means that the behavioural model in practice will be misspecified, which raises the worry that it might lead to unsound inferences if applied to real-world data. In this thesis, we provide a comprehensive mathematical analysis of partial identifiability and misspecification in IRL. Specifically, we fully characterise and quantify the ambiguity of the reward function under all of the behavioural models that are most common in the current IRL literature. We also provide necessary and sufficient conditions that describe precisely how the observed demonstrator policy may differ from each of the standard behavioural models before that model leads to faulty inferences about the reward function R. In addition to this, we introduce a cohesive framework for reasoning about partial identifiability and misspecification in IRL, together with several formal tools that can be used to easily derive the partial identifiability and misspecification robustness of new IRL models, or analyse other kinds of reward learning algorithms.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Skalse, J. (2025). Partial identifiability and misspecification in inverse reinforcement learning [PhD thesis]. University of Oxford.

MLA Style

Skalse, J. Partial Identifiability and Misspecification in Inverse Reinforcement Learning. University of Oxford, 2025.

Chicago Style

Skalse, J. 2025. “Partial Identifiability and Misspecification in Inverse Reinforcement Learning.” PhD thesis, University of Oxford.
Share
Print