Seminar

The Multi-Armed Bandit Problem with Covariates

Vianney Perchet (Université Paris Diderot)

March 29, 2013, 13:45–15:00

Toulouse

Room MF 323

Decision Mathematics Seminar

Abstract

We consider the optimization framework known as "multi-armed bandit" where a finite number of actions (or arms/decisions) give random rewards, whose expectations depend on the action. The objective of a decision maker is to select sequentially actions in order to find the quickest possible, the optimal action (or equivalently, to minimize his regret). We introduce and analyze a natural policy, called Successive Elimination, that achieves optimal bounds. We will then consider the case where the noisy reward realizations depend on an observable covariate; this setting allows for dynamically changing rewards that better describe applications where side information is available. To maximize the expected cumulative reward, we introduce a policy called Adaptively Binned Successive Elimination that adaptively decomposes the global problem into suitably “localized” static bandit problems. (With Philippe Rigollet - University of Princeton)