Stochastic control approach to the multi-armed bandit problems

Treetanthiploet, T

Thesis

Stochastic control approach to the multi-armed bandit problems

Abstract:: A multi-armed bandit is the simplest problem to study learning under uncertainty when decisions affect information. A standard approach to the multi-armed bandit often gives a heuristic construction of an algorithm and proves its regret bound. Following a constructive approach, it is often possible to find a scenario where following heuristic approaches gives a poor decision.

In this thesis, we consider solving the multi-armed bandit problem from first principles, in terms of stochastic control. We propose two novel approaches to address the multi-armed bandit problem. The first approach is to apply a relaxed control analogy to obtain a semi-closed form approximation to the optimal solution. The proposed model covers a wide range of bandit problems, and the proposed strategy can be computed with a low computational complexity with an empirically strong performance. The second approach focuses on bandits with independent arms and considers the interaction between two aspects of uncertainty: uncertainty aversion and learning. These aspects are in some sense opposite; one is pessimistic, but another is optimistic. To see this interaction, we consider a class of strategies that allows marginal projection on each arm and prove Gittins theorem under nonlinear expectation.

Overall, our proposed approaches provide an understanding of how to make decisions under uncertainty when our decisions determine future information. These results should be helpful as a foundation to combine stochastic control with more modern AI theories.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Treetanthiploet, T. (2021). Stochastic control approach to the multi-armed bandit problems [PhD thesis]. University of Oxford.

MLA Style

Treetanthiploet, T. Stochastic Control Approach to the Multi-Armed Bandit Problems. University of Oxford, 2021.

Chicago Style

Treetanthiploet, T. 2021. “Stochastic Control Approach to the Multi-Armed Bandit Problems.” PhD thesis, University of Oxford.
Share
Print