I have decided to rewrite some parts of the course. The notes will from this point start to differ to those from last year. Please look for new notes appearing on the web site. Examples Sheets 1 and 2 are now as I intend, but I will be revising Sheet 3.
Let me emphasize that in Lecture 4, we used $F_s(x)$ to mean the value function of a MDP over $s$ steps, with $F_0(x)=0$. In the case of $c(x,u)\geq 0$, $F_s(x)$ is clearly nondecreasing in $s$. By contrast, in this lecture, $F_s(x)$ was used to mean the minimum cost in a stopping problem in which we must stop within $s$ steps, and $F_0(x)=k(x)$. $F_s(x)$ is now clearly nonincreasing in $s$. That is because as $s$ increases we are being given more flexibility as to when we must stop. In both cases $F_s(x)$ satisfies an optimality equation and $F_s(x)\to$ a limit, say $F_\infty(x)$, but from a different direction.
I started today by telling you about a famous problem concerned with searching for a randomly moving hidden object. This is an example of a partially observed Markov decision process. This problem has not yet been solved in full generality. However, the problem has been solved in continuous time. See R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986. The reason the problem is easier in continuous time is that the state $x_1$ (the probability that the object is in location 1) changes continuously, rather than in jumps.
Having completed these lectures you can now attempt all of Examples Sheet 1. I will be making some changes to sheets 2 and 3.
Questions 7 and 8 use the idea of a one-step look-ahead (OSLA) rule that we have met in today's lecture. Question 8 is subtle, because although it is easy to guess the answer by thinking about a OSLA rule, how do you prove this answer is correct? Hint: use an appropriate theorem about positive programming. Be careful when writing down the dynamic programming equation for Question 8 that you put the expected value (or integral and density function for an exponential random variable) in the right place relative to where you place your max{ , }).
Questions 9-11 are deeper, but that is mostly as regards their lengthy descriptions. The solutions are not actually hard. In Q9, you'll want to use the idea of substituting an equation into itself. One of the reasons I include these questions is to show you what classifies as a research level problem in this field. I have recently written a paper about Question 11, ABCs of the bomber problem and its relatives Annals of Operations Research, 2011.