About time: model-free reinforcement learning with timed reward machines

Roy, R; Majumdar, A; Raha, R; Parker, D; Kwiatkowska, M

AI Collection

Conference item

About time: model-free reinforcement learning with timed reward machines

Abstract:: Reward specification plays a central role in reinforcement learning (RL), guiding the agent’s behavior. To express non-Markovian rewards, formalisms such as reward machines have been introduced to capture dependencies on histories. However, traditional reward machines lack the ability to model precise timing constraints, limiting their use in time-sensitive applications. In this paper, we propose timed reward machines (TRMs), which are an extension of reward machines that incorporate timing constraints into the reward structure. TRMs enable more expressive specifications with tunable reward logic, for example, imposing costs for delays and granting rewards for timely actions. We study model-free RL frameworks (i.e., tabular Qlearning) for learning optimal policies with TRMs under digital and real-time semantics. Our algorithms integrate the TRM into learning via abstractions of timed automata and employ counterfactualimagining heuristics that exploit the TRM’s structure to improve search. Experimentally, we demonstrate that our algorithm learns policies that achieve high rewards while satisfying the timing constraints specified by the TRM on popular RL benchmarks.

Reward specification plays a central role in reinforcement learning (RL), guiding the agent’s behavior. To express non-Markovian rewards, formalisms such as reward machines have been introduced to capture dependencies on histories. However, traditional reward machines lack the ability to model precise timing constraints, limiting their use in time-sensitive applications. In this paper, we propose timed reward machines (TRMs), which are an extension of reward machines that incorporate timing constraints into the reward structure. TRMs enable more expressive specifications with tunable reward logic, for example, imposing costs for delays and granting rewards for timely actions. We study model-free RL frameworks (i.e., tabular Qlearning) for learning optimal policies with TRMs under digital and real-time semantics. Our algorithms integrate the TRM into learning via abstractions of timed automata and employ counterfactualimagining heuristics that exploit the TRM’s structure to improve search. Experimentally, we demonstrate that our algorithm learns policies that achieve high rewards while satisfying the timing constraints specified by the TRM on popular RL benchmarks.

Publication status:: Accepted

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Roy, R., Majumdar, A., Raha, R., Parker, D., & Kwiatkowska, M. (2026). About time: model-free reinforcement learning with timed reward machines. 35th International Joint Conference on Artificial Intelligence (IJCAI 2026).

MLA Style

Roy, R, et al. “About Time: Model-Free Reinforcement Learning with Timed Reward Machines.” 35th International Joint Conference on Artificial Intelligence (IJCAI 2026), 2026.

Chicago Style

Roy, R, A Majumdar, R Raha, D Parker, and M Kwiatkowska. 2026. “About Time: Model-Free Reinforcement Learning with Timed Reward Machines.” In 35th International Joint Conference on Artificial Intelligence (IJCAI 2026).
Print

Authors

+ Roy, R More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author

+ Majumdar, A More by this author

Role:: Author

+ Raha, R More by this author

Role:: Author

+ Parker, D More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Oxford college:: Trinity College
Role:: Author
ORCID:: 0000-0003-4137-8862

+ Kwiatkowska, M More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author

Acceptance date:: 2026-04-30
Event title:: 35th International Joint Conference on Artificial Intelligence (IJCAI 2026)
Event location:: Bremen, Germany
Event website:: https://2026.ijcai.org/
Event start date:: 2026-08-15
Event end date:: 2026-08-21

Language:: English
Pubs id:: 2421110
Local pid:: pubs:2421110
Deposit date:: 2026-05-18
ARK identifier:: ark:/29072/ora_f60ca157d85e420883c95f15c56af888

Terms of use

Notes:: This conference paper has been accepted for presentation at the 35th International Joint Conference on Artificial Intelligence.

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

About time: model-free reinforcement learning with timed reward machines

Actions

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

About time: model-free reinforcement learning with timed reward machines

Actions

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions