Thesis
Stochastic control approach to the multi-armed bandit problems
- Abstract:
-
A multi-armed bandit is the simplest problem to study learning under uncertainty when decisions affect information. A standard approach to the multi-armed bandit often gives a heuristic construction of an algorithm and proves its regret bound. Following a constructive approach, it is often possible to find a scenario where following heuristic approaches gives a poor decision.
In this thesis, we consider solving the multi-armed bandit problem from first principles, in terms of stochastic control. We propose two novel approaches to address the multi-armed bandit problem. The first approach is to apply a relaxed control analogy to obtain a semi-closed form approximation to the optimal solution. The proposed model covers a wide range of bandit problems, and the proposed strategy can be computed with a low computational complexity with an empirically strong performance. The second approach focuses on bandits with independent arms and considers the interaction between two aspects of uncertainty: uncertainty aversion and learning. These aspects are in some sense opposite; one is pessimistic, but another is optimistic. To see this interaction, we consider a class of strategies that allows marginal projection on each arm and prove Gittins theorem under nonlinear expectation.
Overall, our proposed approaches provide an understanding of how to make decisions under uncertainty when our decisions determine future information. These results should be helpful as a foundation to combine stochastic control with more modern AI theories.
Actions
- Programme:
- The Development and Promotion of Science and Technology talented project
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2021-07-01
Terms of use
- Copyright holder:
- Treetanthiploet, T
- Copyright date:
- 2021
If you are the owner of this record, you can report an update to it here: Report update to this record