Tuesday, January 19, 2016

Lecture 2

The highlight of today's lecture was the Secretary Problem. This is the most famous of all problems in the field of optimal stopping. It is credited to Merrill M. Flood in 1949, who called it the fiancĂ©e problem. It gained wider publicity when it appeared in Martin Gardner's column of Scientific American in 1960. There is an interesting Wikipedia article about it. One of the interesting things said in this article is that in behavioural studies people tend to stop too soon (i.e. marry too soon, make a purchase too soon). See The devil is in the details: Incorrect intuitions in optimal search.

The story about Kepler's search for a wife is taken from the paper, Who Solved the Secretary Problem, by Thomas S. Ferguson. He also discusses the related game of Googol.

A variation of the problem that has never been completely solved is the so-called Robbin's Problem. In this problem we do observe values of candidates, say $X_1,\dotsc, X_h$, and these are assumed to be independent, identically distributed uniform$[0,1]$ random variables. The objective is to maximize the expected rank of the candidate that is selected (best = rank 1, second-best = rank 2, etc). It is known only that, as $h$ goes to infinity, the expected rank that can be achieved under an optimal policy lies between 1.908 and 2.329. This problem is much more difficult that the usual secretary problem because the decision as to whether or not to hire candidate t must depend upon all the values of $X_1,\dotsc, X_t$, not just upon how $X_t$ ranks amongst them.

Following this lecture you can do questions 1–4 and 10 on Example Sheet 1. Question 2 is quite like the secretary problem (and also has a surprising answer). The tricks that have been explained in today's lecture are useful in solving these questions (working in terms of time to go, backwards induction, that a bang-bang control arises when the objective in linear in $u_t$, looking at the cross-over between increasing and decreasing terms within a $\max\{ , \}$, as we did in the secretary problem with $\max\{t/h, F(t)\}$).

Thursday, January 14, 2016

Lecture 1

Today we had definitions and notation for state, control, history, value function, etc, and have developed dynamic programming equations for a very general case and a state-structured case,. Please be patient with the notation. It is not as complex as it may first appear. Things like $a(x,u,t)$, $F(x_t,t)$, $u_t$, and $U_t$ will begin to seem like old friends once you have used them a few times.

The terminology "plant equation" for $x_{t+1}=a(x_t,u_t,t)$ derives from the fact that early optimal control theory was developed with applications to industrial processes in mind, especially chemical plants. We also call it the dynamics equation.

From this first lecture you should be taking away the key idea of dynamic programming, and the fact that problems in stages (with separable cost and a finite time horizon) can often be solved by working backwards from the final stage. The minimum length path (stage coach) problem is trivial, but should have made these ideas very intuitive. You might like to read the Wikipedia entry for Richard Bellman, who is credited with the invention of dynamic programming in 1953.

The course page gives some hyperlinks to the recommended booklist. In particular, Demitri Bertsekas has a web page for his book and slides from his lectures. You might find these interesting to browse through at some later stage.

I mentioned that I had once appeared on ITV's Who Wants to be a Millionaire (October 2003) and that playing it has aspects of dynamic programming. There is a nice Part II exam question on this, including the model solution. You might like to look at this now – simply to see the sort of mathematical problem that this course will enable you to solve. You can also view the overhead slides for a little presentation called A Mathematician Plays "Who Wants to Be a Millionaire?" which I once gave to a Royal Institution workshop for school children. You might like to see how I made best use of my "Ask the audience" lifeline by employing an idea from statistics. Basically, I asked members of the audience not to vote if they felt at all unsure of the right answer.

Examples Sheet 1. Today's lecture has provided all you need to know to do question #1 on Examples Sheet 1. In doing this rather strange question you should grasp the idea that dynamic programming applies to problems that in which cost is incurred in stages. In many problems the stages are time points ($t=0,1,\dotsc$), but in others the stages can be different.

The remaining questions are on Markov decision problems, which we will be addressing in Lectures 2-6.

Monday, January 4, 2016

Course starts January 14, 2016

The 2016 course will start at 11am on Thursday January 14 in MR5. Blog postings for previous years of the course can be found below. However, new entries will be written this year, appropriate to the lectures as they proceed. Examples sheets are available from the link at the right.

Preliminary course notes are in place. My aim in these notes is to tread a Goldilocks path, by being neither too brief nor too verbose. I try to make each lecture a sort of self-contained seminar, with about 4 pages of notes. I will slightly amend and change these notes as the course proceeds. In particular, I may be doing some things differently in the later lectures. Some students like to print notes in advance of the lecture and then write things in the margins when hearing me talk about things that are not in not in the notes.

 I will use this space to talk about some extra things. Sometimes leaving a lecture I think, "I wish I had said ...". This blog gives me a place to say it. Or I can use this space to talk about a question that a student has asked. Or I might comment on an examples sheet question.