Journal article icon

Journal article

ON ERGODIC TWO-ARMED BANDITS

Abstract:

A device has two arms with unknown deterministic payoffs and the aim is to asymptotically identify the best one without spending too much time on the other. The Narendra algorithm offers a stochastic procedure to this end. We show under weak ergodic assumptions on these deterministic payoffs that the procedure eventually chooses the best arm (i.e., with greatest Cesaro limit) with probability one for appropriate step sequences of the algorithm. In the case of i.i.d. payoffs, this implies a "q...

Expand abstract
Publication status:
Published

Actions


Access Document


Publisher copy:
10.1214/10-AAP751

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Role:
Author
Journal:
ANNALS OF APPLIED PROBABILITY
Volume:
22
Issue:
2
Pages:
457-476
Publication date:
2012-04-01
DOI:
ISSN:
1050-5164
Source identifiers:
213451
Language:
English
Keywords:
Pubs id:
pubs:213451
UUID:
uuid:9605d865-239d-46a2-bf33-d5c8c182acd2
Local pid:
pubs:213451
Deposit date:
2012-12-19

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP