Journal article icon

Journal article

Asymptotic randomised control with applications to bandits

Abstract:

We consider a general multi-armed bandit problem with correlated (and simple contextual and restless) elements, as a relaxed control problem. By introducing an entropy premium, we obtain a smooth asymptotic approximation to the value function. This yields a novel semi-index approximation of the optimal decision process. This semi-index can be interpreted as explicitly balancing an exploration–exploitation trade-off as in the UCB (Upper Confidence Bound) principle where the learning premium explicitly describes asymmetry of information available in the environment and non-linearity in the reward function.

Performance of the resulting Asymptotic Randomised Control (ARC) algorithm compares favourably well with other approaches to correlated multi-armed bandits.

Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Publisher copy:
10.3934/naco.2026016

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Oxford college:
New College
Role:
Author
ORCID:
0000-0003-0539-6414
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Role:
Author


More from this funder
Funder identifier:
https://ror.org/0439y7842
Grant:
EP/V056883/1


Publisher:
American Institute of Mathematical Sciences
Journal:
Numerical Algebra, Control and Optimization More from this journal
Publication date:
2026-05-07
Acceptance date:
2026-02-20
DOI:
EISSN:
2155-3297
ISSN:
2155-3289


Language:
English
Keywords:
Pubs id:
2416432
Local pid:
pubs:2416432
Deposit date:
2026-05-08
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP