Optimization and Control: Lecture 7

In today's lecture I used a portion of my slides from a talk on the Gittins Index. You may enjoy looking through the entire talk. It will look better if you download the .pdf file to your computer and read the presentation with a .pdf viewer such as acrobat, rather than try to read within your browser.

The proof of the Gittins index theorem is actually easy, but at the same time deep. It is non-examinable. I would expect you only to know what we mean by a SFABP, the statement of the Gittins index theorem, and how to calculate the indices in simple examples. However, I thought you would enjoy seeing this beautiful result and how it can be proved. Lectures 1-6 have covered everything you need to know in order to understand the Gittins index. Today's lecture has also been an opportunity to revise ideas that we already met in problems on job scheduling and pharmaceutical trials.

Weiztman's Pandora's boxes problem in 7.5 is cute and something I am talking about for the first time. I may have rushed over it a bit, so refer to the notes. I have there an example in which the prize in box $i$ is $0$ or $r_i$, with probabilities $1-p_i$ and $p_i$. Here's another example. Suppose the prize is uniformly distributed over $[0,r_i]$ and $r_i/2<c_i<r_i$. Then the Gittins index is in the undiscounted case is the solution to

$g_i= -c_i +\int_0^{r_i}\max(g_i,u)(1/r_i)du= -c_i +(g_i^2/r_i+r_i/2)$.

The idea here is that Pandora is indifferent between taking home $g_i$, or opening box $i$, at cost $c_i$, and then taking home the best of either $g_i$ or the prize she finds in the box.

This gives $g_i= r_i-\sqrt{2c_ir_i-r_i^2}$, with $0<g_i<r_i$.

The Gittins index theorem is one of the most beautiful results in the field of Markov decision processes. Its discovery and proof in 1974 is due to John Gittins. The proof I have given in today's lecture is very different to Gittins's original proof. It the simplest way to prove the theorem and was first presented in On the Gittins index for multiarmed bandits. Ann. Appl. Prob. 2, 1024-33, 1992.

Optimization and Control

Thursday, February 7, 2013

Lecture 7

Statcounter