Does stochastic gradient really succeed for bandits?

Baudry, D; Johnson, E; Váry, S; Pike-Burke, C; Rebeschini, P

Conference item

Does stochastic gradient really succeed for bandits?

Abstract:: Recent works of Mei et al. (2023, 2024) have deepened the theoretical understanding of the Stochastic Gradient Bandit (SGB) policy, showing that using a constant learning rate guarantees asymptotic convergence to the optimal policy, and that sufficiently small learning rates can yield logarithmic regret. However, whether logarithmic regret holds beyond small learning rates remains unclear. In this work, we take a step towards characterizing the regret regimes of SGB as a function of its learning rate. For two-armed bandits, we identify a sharp threshold, scaling with the sub-optimality gap ∆, below which SGB achieves logarithmic regret on all instances, and above which it can incur polynomial regret on some instances. This result highlights the necessity of knowing (or estimating) ∆ to ensure logarithmic regret with a constant learning rate. For general K-armed bandits, we further show the learning rate must scale inversely with K to avoid polynomial regret. We introduce novel techniques to derive regret upper bounds for SGB, laying the groundwork for future advances in the theory of gradient-based bandit algorithms.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Baudry, D., Johnson, E., Váry, S., Pike-Burke, C., & Rebeschini, P. (2025). Does stochastic gradient really succeed for bandits? 39th Conference on Neural Information Processing Systems (NeurIPS 2025).

MLA Style

Baudry, D, et al. “Does Stochastic Gradient Really Succeed for Bandits?” 39th Conference on Neural Information Processing Systems (NeurIPS 2025), 2025.

Chicago Style

Baudry, D, E Johnson, S Váry, C Pike-Burke, and P Rebeschini. 2025. “Does Stochastic Gradient Really Succeed for Bandits?” In 39th Conference on Neural Information Processing Systems (NeurIPS 2025). NeurIPS.
Print

Access Document

Files:: Baudry_et_al_2025_Does_stochastic_gradient.pdf

(Preview, Accepted manuscript, pdf, 1.3MB, Terms of use)

Publication website:: https://neurips.cc/virtual/2025/loc/san-diego/poster/116753

Authors

+ Baudry, D More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Statistics
Sub department:: Statistics
Role:: Author

+ Johnson, E More by this author

Role:: Author

+ Váry, S More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Statistics
Role:: Author

+ Pike-Burke, C More by this author

Role:: Author

+ Rebeschini, P More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Statistics
Sub department:: Statistics
Role:: Author
ORCID:: 0000-0001-7772-4160

+ UK Research and Innovation More from this funder

Funder identifier:: https://ror.org/001aqnf71
Grant:: EP/Y028333/1

+ Engineering and Physical Sciences Research Council More from this funder

Funder identifier:: https://ror.org/0439y7842
Grant:: EP/S023151/1

Publisher:: NeurIPS
Article number:: 116753
Publication date:: 2025-12-03
Acceptance date:: 2025-09-18
Event title:: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Event location:: San Diego, CA, USA
Event website:: https://neurips.cc/Conferences/2025
Event start date:: 2025-12-02
Event end date:: 2025-12-07

Language:: English
Pubs id:: 2356178
Local pid:: pubs:2356178
Deposit date:: 2026-01-05
ARK identifier:: ark:/29072/ora_370d51d540c347229ce0be09b3d049b0

Terms of use

Copyright holder:: Baudry et al.
Notes:: The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

Does stochastic gradient really succeed for bandits?

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

Does stochastic gradient really succeed for bandits?

Actions

Access Document

Authors

Funding

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions