Journal article

Asymptotic randomised control with applications to bandits

Abstract:: We consider a general multi-armed bandit problem with correlated (and simple contextual and restless) elements, as a relaxed control problem. By introducing an entropy premium, we obtain a smooth asymptotic approximation to the value function. This yields a novel semi-index approximation of the optimal decision process. This semi-index can be interpreted as explicitly balancing an exploration–exploitation trade-off as in the UCB (Upper Confidence Bound) principle where the learning premium explicitly describes asymmetry of information available in the environment and non-linearity in the reward function.

Performance of the resulting Asymptotic Randomised Control (ARC) algorithm compares favourably well with other approaches to correlated multi-armed bandits.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Cohen, S. N., & Treetanthiploet, T. (2026). Asymptotic randomised control with applications to bandits. Numerical Algebra, Control and Optimization.

MLA Style

Cohen, SN, and T Treetanthiploet. “Asymptotic Randomised Control with Applications to Bandits.” Numerical Algebra, Control and Optimization, 2026.

Chicago Style

Cohen, SN, and T Treetanthiploet. 2026. “Asymptotic Randomised Control with Applications to Bandits.” Numerical Algebra, Control and Optimization.
Print

Access Document

Files:: Cohen_and_Treetanthiploet_2026_Asymptotic_randomised_control.pdf

(Preview, Accepted manuscript, pdf, 4.1MB, Terms of use)

Publisher copy:: 10.3934/naco.2026016

Authors

+ Cohen, SN More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Mathematical Institute
Oxford college:: New College
Role:: Author
ORCID:: 0000-0003-0539-6414

+ Treetanthiploet, T More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Mathematical Institute
Role:: Author

+ Engineering and Physical Sciences Research Council More from this funder

Funder identifier:: https://ror.org/0439y7842
Grant:: EP/V056883/1

Publisher:: American Institute of Mathematical Sciences
Journal:: Numerical Algebra, Control and Optimization More from this journal
Publication date:: 2026-05-07
Acceptance date:: 2026-02-20
DOI:: 10.3934/naco.2026016
EISSN:: 2155-3297
ISSN:: 2155-3289

Language:: English
Keywords:: multi-armed bandit

stochastic control

asymptotic approximation
Pubs id:: 2416432
Local pid:: pubs:2416432
Deposit date:: 2026-05-08
ARK identifier:: ark:/29072/ora_3f59c54b6fb7411cae472eb597d19ebc

Terms of use

Copyright holder:: American Institute of Mathematical Sciences
Copyright date:: 2026
Rights statement:: Copyright © 2026 American Institute of Mathematical Sciences
Notes:: The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP