Conference item icon

Conference item

Discounting and drug seeking in biological hierarchical reinforcement learning

Abstract:

Despite a strong desire to quit, individuals with long-term substance use disorder (SUD) often struggle to resist drug use, even when aware of its harmful consequences. This disconnect between explicit knowledge and compulsive behavior reflects a fundamental cognitive-behavioral conflict in addiction. Neurobiologically, differential cue-induced activity within striatal subregions, along with dopamine-mediated connectivity from the ventral to the dorsal striatum, is a key factor in driving compulsive drug-seeking. However, the functional mechanism linking these neuropharmacological findings to the cognitive-behavioral conflict remains unclear.

Another key aspect of addiction is temporal discounting, with studies showing that individuals with drug dependence exhibit steeper discount rates than non-users. Assuming the ventral-dorsal striatal organization reflects a gradient from cognitive to motor-action representations, addiction can be modeled within a hierarchical reinforcement learning (HRL) framework. However, incorporating discounting into the biological HRL framework is challenging, and remains an open problem.

In this work, we build upon an algorithmic model that captures how the action choices that the agent makes when reinforced with drug rewards become impervious to the presence of negative consequences that often follow those choices. We address the challenge of incorporating discounting into the HRL framework by ensuring that the values of natural rewards converge across all hierarchical levels in the HRL framework. In contrast to natural reward values, we show that the pharmacological effects of drugs on the dopamine system cause divergence in drug reward values.

Our results demonstrate that high discounting amplifies drug-seeking behavior across all levels of the hierarchy, suggesting that faster discounting is associated with increased addiction severity and impulsivity. We show how these results align with the evidence supporting temporal discounting as a behavioral marker. Additionally, our model offers testable predictions and establishes a framework that conceptualizes addiction as a disorder of hierarchical decision-making processes.

Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:
Publication website:
https://openreview.net/forum?id=NKfEkpsIxa

Authors


More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Clinical Neurosciences
Role:
Author
ORCID:
0009-0001-2507-5450


Publisher:
OpenReview
Host title:
CCN 2025 Proceedings
Article number:
25
Publication date:
2025-05-14
Acceptance date:
2025-05-06
Event title:
8th Annual Conference on Cognitive Computational Neuroscience (CCN 2025)
Event location:
Amsterdam, The Netherlands
Event website:
https://2025.ccneuro.org/
Event start date:
2025-08-12
Event end date:
2025-08-15


Language:
English
Keywords:
Pubs id:
2298781
Local pid:
pubs:2298781
Deposit date:
2025-10-08

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP