Conference item
Does stochastic gradient really succeed for bandits?
- Abstract:
- Recent works of Mei et al. (2023, 2024) have deepened the theoretical understanding of the Stochastic Gradient Bandit (SGB) policy, showing that using a constant learning rate guarantees asymptotic convergence to the optimal policy, and that sufficiently small learning rates can yield logarithmic regret. However, whether logarithmic regret holds beyond small learning rates remains unclear. In this work, we take a step towards characterizing the regret regimes of SGB as a function of its learning rate. For two-armed bandits, we identify a sharp threshold, scaling with the sub-optimality gap ∆, below which SGB achieves logarithmic regret on all instances, and above which it can incur polynomial regret on some instances. This result highlights the necessity of knowing (or estimating) ∆ to ensure logarithmic regret with a constant learning rate. For general K-armed bandits, we further show the learning rate must scale inversely with K to avoid polynomial regret. We introduce novel techniques to derive regret upper bounds for SGB, laying the groundwork for future advances in the theory of gradient-based bandit algorithms.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 1.3MB, Terms of use)
-
- Publication website:
- https://neurips.cc/virtual/2025/loc/san-diego/poster/116753
Authors
+ UK Research and Innovation
More from this funder
- Funder identifier:
- https://ror.org/001aqnf71
- Grant:
- EP/Y028333/1
+ Engineering and Physical Sciences Research Council
More from this funder
- Funder identifier:
- https://ror.org/0439y7842
- Grant:
- EP/S023151/1
- Publisher:
- NeurIPS
- Article number:
- 116753
- Publication date:
- 2025-12-03
- Acceptance date:
- 2025-09-18
- Event title:
- 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
- Event location:
- San Diego, CA, USA
- Event website:
- https://neurips.cc/Conferences/2025
- Event start date:
- 2025-12-02
- Event end date:
- 2025-12-07
- Language:
-
English
- Pubs id:
-
2356178
- Local pid:
-
pubs:2356178
- Deposit date:
-
2026-01-05
- ARK identifier:
Terms of use
- Copyright holder:
- Baudry et al.
- Copyright date:
- 2025
- Notes:
- The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record