Thursday, January 17, 2013

Lecture 1

In the first lecture I setup definitions and notation for state, control, history, value function, etc, and develop dynamic programming equations for a very general case, a state-structured case, and the Markov decision process case. Please be patient with the notation. It is not as complex as it may first appear. Things like $a(x,u,t)$, $F(x_t,t)$, $u_t$, and $U_t$ will begin to seem like old friends once you have used them a few times. By the way, in case it was not obvious, $h$ is used for the terminal time because it the time horizon.

One person asked me about the terminology "plant equation" for $x_{t+1}=a(x_,u_t,t)$. This derives from the fact that early optimal control theory was developed with applications to industrial processes in mind, especially chemical plants. You could also just call it the dynamics equation.

From this first lecture you should be taking away the key idea of dynamic programming, and the fact that problems in stages (with separable cost and a finite time horizon) can often be solved by working backwards from the final stage. The minimum length path (stage coach) problem is trivial, but should have made these ideas very intuitive. As an example of a problem in which the optimal policy should be determined by working backwards from the end. You might like to read the Wikipedia entry for Richard Bellman, who is credited with the invention of dynamic programming.

I have just now added some hyperlinks to the recommended booklist. In particular, Demetri Bertsekas has a web page for his book and slides from his lectures. You might find these interesting to browse through at some later stage.

I mentioned that I had once appeared on ITV's Who Wants to be a Millionaire (October 2003). There is a nice Part II exam question on this, including the model solution. You might like to look at this now – simply to see the sort of mathematical problem that this course will enable you to solve. You can also view the overhead slides for a little presentation called A Mathematician Plays "Who Wants to Be a Millionaire?" which I once gave to a Royal Institution workshop for school children. You might like to see how I made best use of my "Ask the audience" lifeline by employing an idea from statistics. Basically, I asked members of the audience not to vote if they felt at all unsure of the right answer.

There is space below for you to write comments or ask questions about things that I said in this lecture. Don't be shy, if you are puzzled about something you are unlikely to be the only one, and we can all learn from some discussion. You can post with your name, or anonymously. If you find a mistake in the notes or have a suggestion for their improvement, please write to me at rrw1@cam.ac.uk