Thesis icon

Thesis

Stochastic control approach to the multi-armed bandit problems

Abstract:

A multi-armed bandit is the simplest problem to study learning under uncertainty when decisions affect information. A standard approach to the multi-armed bandit often gives a heuristic construction of an algorithm and proves its regret bound. Following a constructive approach, it is often possible to find a scenario where following heuristic approaches gives a poor decision.

In this thesis, we consider solving the multi-armed bandit problem from first principles, in terms of stochastic control. We propose two novel approaches to address the multi-armed bandit problem. The first approach is to apply a relaxed control analogy to obtain a semi-closed form approximation to the optimal solution. The proposed model covers a wide range of bandit problems, and the proposed strategy can be computed with a low computational complexity with an empirically strong performance. The second approach focuses on bandits with independent arms and considers the interaction between two aspects of uncertainty: uncertainty aversion and learning. These aspects are in some sense opposite; one is pessimistic, but another is optimistic. To see this interaction, we consider a class of strategies that allows marginal projection on each arm and prove Gittins theorem under nonlinear expectation.

Overall, our proposed approaches provide an understanding of how to make decisions under uncertainty when our decisions determine future information. These results should be helpful as a foundation to combine stochastic control with more modern AI theories.

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Sub department:
Mathematical Institute
Research group:
Mathematical finance
Oxford college:
Mansfield College
Role:
Author
ORCID:
0000-0001-7611-4328

Contributors

Role:
Supervisor


More from this funder
Programme:
The Development and Promotion of Science and Technology talented project


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Language:
English
Keywords:
Subjects:
Deposit date:
2021-07-01

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP