Safely interruptible agents

Conference item

Abstract:: Reinforcement learning agents interacting with a complex environment like the real world are unlikely to behave optimally all the time. If such an agent is operating in real-time under human supervision, now and then it may be necessary for a human operator to press the big red button to prevent the agent from continuing a harmful sequence of actions—harmful either for the agent or for the environment—and lead the agent into a safer situation. However, if the learning agent expects to receive rewards from this sequence, it may learn in the long run to avoid such interruptions, for example by disabling the red button— which is an undesirable outcome. This paper explores a way to make sure a learning agent will not learn to prevent (or seek!) being interrupted by the environment or a human operator. We provide a formal definition of safe interruptibility and exploit the off-policy learning property to prove that either some agents are already safely interruptible, like Q-learning, or can easily be made so, like Sarsa. We show that even ideal, uncomputable reinforcement learning agents for (deterministic) general computable environments can be made safely interruptible.

Files:: Orseau_and_Armstrong_2016_Safely_interruptible_agents.pdf

(Preview, Accepted manuscript, pdf, 286.9KB, Terms of use)

Publisher:: AUAI Press
Host title:: Uncertainty In Artificial Intelligence Proceedings of the Thirty-Second Conference (2016)
Pages:: 557-566
Publication date:: 2016-06-25
Acceptance date:: 2016-05-06
Event title:: 32nd Conference on Uncertainty in Artificial Intelligence (UAI 2016)
Event location:: Jersey City, New Jersey, USA
Event website:: http://auai.org/uai2016/index.php
Event start date:: 2016-06-25
Event end date:: 2016-06-29
ISBN:: 978-0-9966431-1-5

Licence:: Terms and Conditions of Use for Oxford University Research Archive

If you are the owner of this record, you can report an update to it here: Report update to this record