Preprint
Composing the value signal for dopamine-mediated learning
- Abstract:
- The seminal reward prediction error theory of dopamine function faces several key challenges. Most notable is the difficulty learning multiple rewards simultaneously, inefficient on-policy learning, and accounting for heterogeneous striatal responses in the tail of the striatum. We propose a normative framework, based on linear reinforcement learning, that redefines dopamine’s computational objective. We propose that dopamine optimises not just cumulative rewards, but a reward value function augmented by a penalty for deviating from a default behavioural policy, which effectively confers value on controllability. Our simulations show that this single modification enables optimal value composition, fast and robust adaptation to changing priorities, safer exploration in the context of threats, and stable learning amid uncertainty. Critically, this unifies disparate striatal observations, parsimoniously reconciling threat and action prediction error signals within the striatal tail. Our framework refines the core principle governing striatal dopamine, bridging theory with neural data and offering testable predictions.
- Publication status:
- Published
- Peer review status:
- Not peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Pre-print, pdf, 16.0MB, Terms of use)
-
- Preprint server copy:
- 10.1101/2025.10.10.681616
Authors
+ Wellcome Trust
More from this funder
- Funder identifier:
- https://ror.org/029chgv08
- Grant:
- 203139/A/16/Z
- 214251/Z/18/Z
- 203139/Z/16/Z
+ Japan Society for the Promotion of Science
More from this funder
- Funder identifier:
- https://ror.org/00hhkn466
- Grant:
- 22H04998
+ Institute of Information & Communications Technology Planning & Evaluation
More from this funder
- Funder identifier:
- https://ror.org/01g0hqq23
- Grant:
- MSIT 2019-0-01371
- Preprint server:
- bioRxiv
- Publication date:
- 2025-11-22
- DOI:
- Language:
-
English
- Pubs id:
-
2303799
- UUID:
-
uuid_fe2b180f-a8c6-4289-b5a5-e6e98a059568
- Local pid:
-
pubs:2303799
- Source identifiers:
-
W4415055624
- Deposit date:
-
2026-01-28
- ARK identifier:
Terms of use
- Copyright holder:
- Mahajan and Seymour
- Copyright date:
- 2025
- Rights statement:
- The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
- Notes:
- This work is related to the thesis Safe learning in humans and machines.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record