I have decided to rewrite some parts of the course. The notes will from this point start to differ to those from last year. Please look for new notes appearing on the web site. Examples Sheets 1 and 2 are now as I intend, but I will be revising Sheet 3.
Let me emphasize that in Lecture 4, we used Fs(x) to mean the value function of a MDP over s steps, with F0(x)=0. In the case of c(x,u)≥0, Fs(x) is clearly nondecreasing in s. By contrast, in this lecture, Fs(x) was used to mean the minimum cost in a stopping problem in which we must stop within s steps, and F0(x)=k(x). Fs(x) is now clearly nonincreasing in s. That is because as s increases we are being given more flexibility as to when we must stop. In both cases Fs(x) satisfies an optimality equation and Fs(x)→ a limit, say F∞(x), but from a different direction.
I started today by telling you about a famous problem concerned with searching for a randomly moving hidden object. This is an example of a partially observed Markov decision process. This problem has not yet been solved in full generality. However, the problem has been solved in continuous time. See R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986. The reason the problem is easier in continuous time is that the state x1 (the probability that the object is in location 1) changes continuously, rather than in jumps.
Having completed these lectures you can now attempt all of Examples Sheet 1. I will be making some changes to sheets 2 and 3.
Questions 7 and 8 use the idea of a one-step look-ahead (OSLA) rule that we have met in today's lecture. Question 8 is subtle, because although it is easy to guess the answer by thinking about a OSLA rule, how do you prove this answer is correct? Hint: use an appropriate theorem about positive programming. Be careful when writing down the dynamic programming equation for Question 8 that you put the expected value (or integral and density function for an exponential random variable) in the right place relative to where you place your max{ , }).
Questions 9-11 are deeper, but that is mostly as regards their lengthy descriptions. The solutions are not actually hard. In Q9, you'll want to use the idea of substituting an equation into itself. One of the reasons I include these questions is to show you what classifies as a research level problem in this field. I have recently written a paper about Question 11, ABCs of the bomber problem and its relatives Annals of Operations Research, 2011.