Journal article icon

Journal article

ON ERGODIC TWO-ARMED BANDITS

Abstract:

A device has two arms with unknown deterministic payoffs and the aim is to asymptotically identify the best one without spending too much time on the other. The Narendra algorithm offers a stochastic procedure to this end. We show under weak ergodic assumptions on these deterministic payoffs that the procedure eventually chooses the best arm (i.e., with greatest Cesaro limit) with probability one for appropriate step sequences of the algorithm. In the case of i.i.d. payoffs, this implies a "q...

Expand abstract
Publication status:
Published

Actions


Access Document


Publisher copy:
10.1214/10-AAP751

Authors


More by this author
Institution:
University of Oxford
Department:
Oxford, MPLS, Mathematical Inst
Role:
Author
Journal:
ANNALS OF APPLIED PROBABILITY
Volume:
22
Issue:
2
Pages:
457-476
Publication date:
2012-04-05
DOI:
ISSN:
1050-5164
URN:
uuid:9605d865-239d-46a2-bf33-d5c8c182acd2
Source identifiers:
213451
Local pid:
pubs:213451

Terms of use


Metrics


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP