Thursday, March 14, 2013

Give your feedback

This year's course finished on 12 March 2013. 

Your initial state of knowledge was $W_0$, and has been brought to $W_{16}$. It is now your job to control the trajectory onwards. Information about certain unobservables will be revealed by variables $y_1,y_2,y_3$ in the Easter Term.

As you study and revise from the notes I would appreciate an email from you if you see any typos or grammatical mistakes, or find something that you recommend could be explained better. I try to tread a Goldilocks path in these notes of being neither too verbose nor too succinct. 

You can give the standard feedback on-line here. This is identical to the Faculty's paper form that I will handout on Thursday. It will be sent to my email anonymously. After reading the responses, I will forward them to the Faculty Office. The online form is easy to use and is particularly useful for anyone who is following the course from my notes and is not attending lectures.

Proof by Genie, continued

I am interested in collecting more examples of Proof by Genie, as mentioned in a previous post. If you come across an example of this, please let me know.

I've now remembered another example of proof by genie, actually from one of my own recent papers. Two people are trying to find each other on the vertices of a triangle. Initially they start at two different vertices and at each step $t=1,2,\dotsc$ each player stays or moves to another vertex, trying to minimize the expected number or steps until they meet. In working out the best symmetric rendezvous strategy it helps to assume that the players are given a common notion of what is "clockwise" on the triangle. However, once the optimal strategy is found it turns out that a common notion of clockwise is not needed to implement the optimal strategy. If you would like to learn more about this problem see 


R. R. Weber, Optimal symmetric rendezvous search on three locations, Math Oper Res., 37(1): 111-122, 2012.

or seminar slides. One of the slides begins with the words "Suppose the problem is made a bit easier: the players are provided with the same notion of clockwise. (We will see later that this does not actually help the players.)"

Tuesday, March 12, 2013

Broom balancing in the 21st century

Have a look at this link to see what optimal control can do when it is coupled to today's fast microprocessors and feedback controllers:

Quadrocopter pole acrobatics

Thanks to Felix Horns for finding this link.

Risk-sensitive optimal control

I ended the course by saying a little about risk-sensitive optimal control. This is a concept introduced by Peter Whittle and an area in which there are still new ideas to be researched. The objective function is now
\[
\gamma_\pi(C)=-\frac{1}{\theta}E_\pi e^{-\theta C} = E_\pi C -\tfrac{1}{2}\theta\,\text{var}_\pi(C)+\cdots
\] So when $\theta>0$ we are risk-seeking, or optimistic. We like variance in $C$. This is perhaps the attitude of those who gamble or who are high-frequency traders of bonds. Many of the ideas we have met in the course go through, such as a certainty-equivalence principle in the LQG case. One can even obtain a risk-sensitive stochastic maximum principle.

I talked about the problem of maximizing consumption over a lifetime, but now with uncertain lifetime. I had to be brief, so here are more careful details if you are interested. The capital $x$ evolves as $dx/dt=ax-u$. We wish to maximize
\[ \int_0^T \log u(t) dt + \kappa \log x(T).
\] If $T$ is certain, then $u(t)=x(t)/(\kappa+s)$, where $s=T-t$. We saw this in Lecture 14.

But now suppose the lifetime is uncertain, given by $\dot y = -1 +\epsilon$, where $\epsilon dt=dB$. The problem takes place over $[0,\tau]$, where $\tau$ is the time at which $y(\tau)=0$. The cute answer is that the solution is $u(t)=x(t)/(\kappa+ s)$, where $s$ is now the "effective remaining life", which is related to $x,y$ by
\[
y^2-s^2=2\theta N s^2\left(1-a(\kappa+s)+\log\left(\frac{\kappa+s}{x}\right)\right).
\] If $\theta=0$ then $s=y$, which gives us a certainty-equivalence result. But now think about $\theta>0$ (the optimistic case). For a given $y$, this exhibits the interesting property that if $x$ is large then $s > y$. In this case the individual is optimistic of a long life, and has sufficient capital to believe he can build it up, while consuming more slowly than if he were not optimistic. But if $x$ is small then $s < y$. Having only little capital, there is not much time to build it up, and so he optimistically takes the view that his life will be short (!), and that it is best to consume what little he has at a faster rate than he would if he was certain that his remaining life were to be $y$.

For $\theta < 0$ the policy is reversed. The pessimist is risk averse to running out of money because of living longer than expected, or not spending fast enough when life is shorter than expected.

Lecture 16

Brownian motion and the Wiener process are covered in the Part II course Stochastic Financial Models. The Wikipedia articles referenced in the previous sentence are good and contain some nice video demonstrations of things like self-similar scaling. In this lecture I reviewed only the very basics, but enough for us to develop a simple stochastic differential equation that corresponds to the optimality equation of dynamic programming in the case of continuous time and a continuous state space. Although Brownian motion is a subtle thing, one can infer (or at least guess) many of its properties by thinking about the simple discrete time symmetric random walk, of which it is a limit.

There are many interesting things to be said about Brownian motion. Paul Levy's Forgery Theorem for Brownian motion, implies that in two dimensions almost surely every arc of the path contains the complete works of Shakespeare in the handwriting of Bacon.

I mentioned in the lecture that it is initially surprising to see that in Example 16.4 the cost is greatest when there is no noise. This is a little bit counterintuitive and so we should try to understand why this is. I guess it is because $C_0$ and $C_1$ are almost the same, and $L$ is small compared to $Q$, and so we can let the noise take us to the endpoints, only expending minimal effort (which is costed at $Qu^2$). I guess that if $C_0$ were a lot less than $C_1$, or if $L$ were bigger compared to $Q$, then noise will not be helpful and we would expect Figure 4 to look different.

Monday, March 11, 2013

Solving problems using PMP

In Questions 6-10 of Sheet 3 the strategy is to use the necessary conditions of PMP to synthesize the optimal solution. What I mean by "synthesize" is that you write down all the information that PMP says is necessary (in terms of $u(t)$ maximizing, $H(x,u,\lambda)$ to $-\lambda_0(t)$, and $\dot \lambda_i=-\partial H/\partial x_i$, and any boundary and transversality conditions) until you convince yourself that the optimal policy can only be one thing. Question 7 is a bit like proving the shortest distance between two points is a straight line. The monster's optimal strategy is fairly obvious, but we can prove it is optimal by applying the necessary conditions of PMP to rule out the possibility of any other solution.

Thursday, March 7, 2013

Feldbaum-Bushaw problem

I think I have finally tracked down the answer to the question in the blog for Lecture 14, when I wondered about the naming of the rocket car problem. I am now guessing that the answer is none of the above. According to J. T. Olympio, A continuous implementation of a second-variation optimal control method for space trajectory problems. J Optim Theory Appl, 2013

The double integrator problem (also called the Feldbaum–Bushaw problem) is a time minimization problem, where a frictionless particle moving along a line with an initial velocity and position should be put to rest." 

This is exactly our rocket car problem. Apparently, it was first solved by D.W. Bushaw, Differential Equations with a  Discontinuous Forcing Term, Ph.D. Thesis, Princeton, 1952.

In the obituary of Donald W. Bushaw (1926-2012) it is stated that "Don’s Ph.D. thesis is recognized as the beginning of modern optimal control theory."

The name of A. A. Feldbaum (a Russian) is also mentioned in connection with this problem which he solved at about the same time. Pontryagin came up with his maximum principle a few years later, 1956.

Lecture 15

Today's lecture was about solving some problems using Pontryagin's Maximum Principle. Some personal experience of the power of PMP came in solving the problem in the paper

R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986.

The problem that is solved in this paper in continuous time is still an open problem in discrete time. That shows the power of PMP, in that with it we can solve problems in discrete time that cannot be solved in discrete time.

I mentioned today in connection with section 15.5 (insects as optimizers) that zoologists have observed that in real-world choices of $t_{\text{switch}}$ some insect populations seem to do a good job of predicting $T$ (the time of the serious frost that ends the lives of the workers.)

What makes for a beautiful problem in science?

In his paper with the title, What makes for a beautiful problem in Science? (Journal of Political Economy, Vol. 78, No. 6, 1970),  Paul Samuelson writes:

What is the shortest distance between two points? A straight line. Can you prove it?

What is the longest distance between two points? Surely, no answer is possible. But consider two points a mile apart on a horizontal line in the plane. Suppose that the admissible paths cannot have an absolute slope exceeding one million. Then it is not hard to envisage that the longest distance is not (sic) a bit over a million miles. But it takes the methods of Pontryagin and not those of the classical theory to handle such a problem in which the control variables are constrained within a closed set.

You might like to see if you can use Pontryagin's maximum principle to prove what Samuelson claims. You must start deciding what he means by an admissible path. I expect he is thinking about a curve between $(0,0)$ and $(1,0)$ that is a graph $y(x)$ determined by $y'(x)=u(x)$, $0\leq x \leq 1$. I have added (sic) (for sic erat scriptum) because I think the word "not" should be deleted. He  is trying to say that the maximum distance is a bit over a million miles, in fact $\sqrt{10^{12}+1}$. Samuelson goes on:

A scientific problem gains in aesthetic interest if you can explain it to your neighbor at a dinner party. 

That no map requires more than four colors is such a problem. Can the turnpikes of Professor Chakravarty pass this test? In a system in which everything produces everything, a critical set of proportions at which balanced growth is most rapid seems reasonable. So the general notion of the von Neumann turnpike is conveyed. But suppose we begin away from this highway, and also are to end up away from it. Still, if the journey is to be a sufficiently long one, as seen we shall do well to move in the beginning toward that fast highway; catch a fast ride along it, staying so long there as to leave behind in all directions any rival who sticks to any slower path; then, toward the end of our journey, we can swing off the turnpike to our final rendezvous.

We see this in today's lecture (the turnpike of economic growth, Section 15.6).

I cannot resist one more excerpt from Samuelson's paper, in which he muses on the way in which an optimal path may sometimes seem strange.

Begin with durable hammers and robots that can make new hammers or robots with the same symmetrical input requirements for the one as for the other. Each input is subject to familiar diminishing returns; but increasing both inputs proportionally leads to constant returns to scale. I begin with 10 robots and 5 hammers and must end up in minimum time with (at least) 1,000 robots and 5,000 hammers. How should I proceed? Surely, I begin by producing only hammers, since the redundant robots are subject to diminishing returns. When I reach 10 of each input, I shift gears to produce equal amounts of both, proceeding along what is known as a von Neumann-DOSSO turnpike. You might think I go along it until 1,000 of each are attained, thereafter producing only the 4,000 more hammers. And you might be right. But you might be wrong. Depending upon the exact technology, it might pay to produce more of both along the turnpike, ceasing to produce more robots only after they are somewhere between 1,000 and 5,000 in number. We end up with more than we needed of robots. Irrational, since we don't want them and they cost something? Not necessarily, since they may have been worthwhile for their help in producing the extra hammers we do want.

Tuesday, March 5, 2013

Lecture 14

The maximum principle is due to Lev Pontryagin. It is remarkable that despite being blind he was one of the greatest mathematicians of his generation. The key thing to grasp is that the PMP provides necessary conditions. We use the fact that an adjoint trajectory $\lambda$ exists to deduce properties of, or completely determine, the optimal control and optimally controlled trajectory. To my thinking, the PMP is notoriously badly explained in most books that it appears. I hope I have been able to make it seem more intuitive.  Lagrange multipliers give a helpful interpretation, as does differentiation of the infinitestimal version of the optimality equation.

The rocket car example is a celebrated problem that is nicely solved using the PMP. Whittle calls it the "Bush problem", but does not explain why.  There are two plausible answers. This may be a reference to the famous engineer Vannevar Bush, but I have not been able to track down any definitive reference. Whittle talks about bringing the rollers of a rolling mill to rest in a standard position in minimum time. So perhaps he is thinking about a roller chain or "bush roller chain", which (according to Wikipedia) is the type of chain drive most commonly used for transmission of mechanical power on many kinds of domestic, industrial and agricultural machinery, including conveyors, wire- and tube-drawing machines, printing presses, cars, motorcycles, and bicycles. I will ask Peter Whittle if he remembers.

I think I have finally guessed the answer to the question posed above. According to J. T. Olympio, A Continuous Implementation of a Second-Variation Optimal Control Method for Space Trajectory
Problems. J Optim Theory Appl, 2013,

"The double integrator problem (also called the Feldbaum–Bushaw problem) is a time minimization problem, where a frictionless particle moving along a line with an initial velocity and position should be put to rest." 

This is exactly our rocket car problem. Apparently, it was first solved by D.W. Bushaw, Differential Equations with a  Discontinuous Forcing Term, PhD Thesis, Princeton, 1952.

In the obituary of Donald W. Bushaw (1926-2012) it is stated that "Don’s PhD thesis is recognized as the beginning of modern optimal control theory."

There is a nice interactive demo of the solution to the rocket car parking problem that you can try.

Thursday, February 28, 2013

Sustainable fishing

At the end of Lecture 13 I introduced the Hamilton-Jacobi-Bellman equation, which is a continuous time version of the optimality equation. In Section 13.3 (which I did not discuss in lectures) there is a presentation of the HJB equation for the continuous-time version of our well-studied LQ regulation problem. This is essentially Examples Sheet 3, Question 3. So you can review that section when you do the examples sheet. It is straightforward.

I have just now added to the notes a Section 13.4. This section is not examinable, but I thought you might find it interesting. The HJB equation is used in this example to deduce how fisherman might either fish to extinction, or not, a potentially sustainable fish population. The population follows dynamics of  $\dot x=a(x)-u$, when $x>0$, where $u$ is the rate at which fish are extracted. Fishermen are trying to maximize
$$\int_0^\infty u(t)  e^{-\alpha t}dt.$$ Whether the fish population is completely wiped out, or sustained, depends on whether the discounting rate $\alpha$ is greater or less than the rate at which the fish population can grow when it is small, i.e. $a'(0)$. If $\alpha< a'(0)$ then the population will converge to a sustainable positive level $\bar x$  and at which the optimal fishing rate is $\bar u$ and $\dot x=a(\bar x)-\bar u=0$.

We are not going to spend much time thinking about how to solve HJB equations directly, because the theory of Pontryagin's Maximum Principle that we will meet in Lecture 14 is more powerful. However, Questions 4 and 5 are about find a solution to the HJB equation, and you begin these questions by writing down the infinitesimal form of the optimality equation.

Lecture 13

The name "Kalman filter" refers to the estimation equation (13.1) and takes its name from Rudolf Kalman (1930 –), who developed it in the years 1958-64. He also coined the terms controllable and observable, and gave the criteria that we have seen in previous lectures. The fact that a system is controllable iff the matrix $[B\ AB\ \cdots\ A^{n-1}B]$ is of full rank is sometimes called Kalman's criteria. In the IEEE biography of Kalman it is stated
The Kalman filter, and its later extensions to nonlinear problems, represents perhaps the most widely applied by-product of modern control theory. It has been used in space vehicle navigation and control (e.g. the Apollo vehicle), radar tracking algorithms for ABM applications, process control, and socioeconomic systems.
The theory in this lecture is admittedly quite tricky - partly because the notation. As a test of memory, can you say what roles in the theory are taken by each of these?

 $x_t$,  $u_t$, $A$, $B$, $\epsilon_t$, $y_t$, $C$, $\eta_t$, $\hat x_t$, $\Delta_t$, $\xi_t$, $\zeta_t$, $R$, $S$, $Q$, $K_t$, $\Pi_t$, $N$, $L$, $M$, $H_t$,  $V_t$. 

 You will understand the ideas better once you have worked through the details of a scalar example (in which $n=m=p=1$). You do this in Example Sheet 3 Question 2. I also think that Question 1 is helpful in gaining an appreciation of the duality between control and estimation problems.

You will not be asked to reproduce the statement or proofs of Theorem 13.1 or 13.2 in examinations. You should simply know that $\hat{x}_t$ is computed from $\hat{x}_{t-1}$ and $y_t$ in the linear manner specified by (13.1), and that the covariance matrix $V_t$ satisfies a Riccati equation. You are not expected to memorize Riccati equations.

Notice that the Riccati equation for $V_t$, i.e. $V_t = g\, V_{t-1}$ runs in the opposite time direction to the one we had for $\Pi_t$ in lecture 10, where $\Pi_{t-1} = f\, \Pi_t$. We are given $V_0$ and $\Pi_h$.

Coda on the puzzle

Here is an amusing sporting fact that resonates with our puzzle in which a decision that is the same under conditions of A true and A false, may not be the right decision when the truth status of A is unknown.

John Howard (London School of Economics)  tells me that in cricket it can happen that a batsman attempts to drive the ball but appears to miss (or he may have snicked it). The ball then hits his pad, bounces off, and is caught by a slip fielder. There is an appeal for lbw (leg before wicket). The umpire would have given him out "lbw" if he was sure he had not hit the ball. He would be out "caught" if he had hit the ball. But a batsman cannot be given out unless a definite reason is given, and if the umpire is not sure which of these two it was, then batsman is not out.

I am told that a similar incident happened in the recently concluded India-Australia Test, where the bowler appealed for lbw or caught, the umpire upheld the appeal for lbw, but the batsman was upset thinking he was given caught out as he had not edged the ball. 

You can read more about this in the article, Ricky Ponting and the Judges, by Ian Rumfitt, who writes:

"In the first innings of the final Ashes Test of 2009, Ricky Ponting faced a ball which was deflected, off something, into the wicket-keeper’s hands. The English XI appealed, and in the agonizingly long time that it took Asad Rauf to decide, Jonathan Agnew (commentating on Test Match Special) reasoned as follows: ‘Either the ball hit Ponting’s bat or it hit his pads. If it hit his bat, he is out caught behind. If it hit his pads, he is out lbw. So, either way, he is out’. Rauf, however, appeared to disagree with Agnew’s reasoning, and Ponting stayed at the wicket."

Tuesday, February 26, 2013

Proof by Genie

Sometimes arguments like that which appear in the puzzle can be useful in proofs. In their paper, An  optimal  strategy  for  a conflict  resolution problem, (Venkat  Anantharam  and  Pravin  Varayia, Systems and Control Letters, 1986, pp. 329 -332), the authors consider a problem in which each of $n$ people has a number in $[0,1]$. The aim is to find out who has the greatest number, using the minimum expected number of yes/no questions, which are to be asked in sequence to everyone simultaneously.

Questions can be of the form "Whose number in subset $A$ of $[0,1]$?", for any subset $A$. They prove that it is optimal to restrict questions to the simplest form of "Whose number is bigger than $x$?"  To prove this they use an argument like this:

We  postulate  the  following  genie: 
  
The  genie  tells  us which  of  the  sets $A$ and $A^c$ ...

I omit the detail (see their paper if you like). I focus here on the idea that the genie is providing extra information that can make the optimal policy easier to figure out. The authors continue:

By postulating a genie we mean that we permit ourselves to use different strategies on events on which the genie gives us different answers. Clearly we can do no better without the genie than we can with it.

They then show that given the information provided by the genie, the optimal policy is one that could have been implemented without having that information, and which takes a special form. So they can conclude the optimal policy (without the genie) must also take this special form (i.e. that it suffices to ask questions of the simple form "Whose number is bigger than $x$?").

We see, however, that one has to be careful with this type of proof. In the puzzle we are drawn into thinking about a genie who tells us whether A is true or false. At issue is the fact that the policy which is optimal, given what the genie tells us about A, may have identical $u_0$, whatever the genie says, but this is only because we can act differently for $u_1, u_2, \dotsc$.

Answer to the puzzle

There have been 10 votes for "yes" and 5 votes for "no". The intuitive answer to the puzzle is yes ($u_0=1$ is optimal).  That might make you suspicious that I am going to argue for the non-intuitive answer, no ($u_0=1$ is not necessarily optimal). I wonder if you will like my answer, or think I am cheating?

I said nothing about $u_0$ being the only decision. I make no apology, because this lecture course has been all about taking a sequence of decisions (or controls) $u_0,u_1,u_2,\dotsc$ over time.  That possibility should have been in your mind, and I was giving you a hint by denoting the decision as $u_0$ (the first of a sequence of decisions).

Suppose we are faced with a stopping problem in which $u_0=1$ means "continue" and $u_0=0$ means "stop". We should consider the possibility that by knowing that A is true, or by knowing A is false, it is optimal to continue. But if we do not know whether A is true or false then it is optimal to stop.

It is not completely obvious that this can happen. So let's see how to construct an example in which it does. Consider a version of the secretary problem in which we will see 3 candidates. They are "best", "middle" and 'worst" (B,M,W). Of course we can only compare them with one another, so if they are presented in the order M,B,W, we would see this as $(x_1,x_2,x_3)=(1, 1, 0)$, where $x_i=1$ if the $i$th candidate is the best so far. Suppose that the candidates will be presented in one of the of the following four orders, with consequent sequence of 0s and 1s, with the probabilities given:
  1. B,M,W (1,0,0) with probability 0.2
  2. M,B,W (1,1,0)                  0.3
  3. B,W,M (1,0,0)                  0.2
  4. W,M,B (1,1,1)                  0.3
It is a bit artificial that all 6 possible orders are not equally likely, but for the purpose of this discussion we are not trying to think about a really practical problem.

We wish to maximize the probability of stopping on the last 1.

Now suppose the unknown "A" is whether "the worst candidate will be seen last".

If "A" is true it will be 1 or 2, but if A is false it will be 3 or 4.

You can now verify that given A is true it is optimal not to stop on the first 1 ($u_0=1=$ "continue"). The same decision is optimal if A is false. In both cases we win with probability 0.6.

However, if we do not know whether A is true or false, then if is optimal to stop on the first 1 $(u_0=0$, and win with probability 0.4). If we do not stop we will with probability 0.6 reach 1, 1 and from there can win with probability of only 0.5. So our win probability is only 0.3.

Do you think this example is a cheat (in that it does not really match up with the way I posed the original puzzle), or have you learned something interesting?

I was thinking about this recently in connection with a similar question. I wonder if any of you can invent a neater example to illustrate this point, or one with a better "back-story".

The question I have been thinking about concerns an application of Bruss's odds theorem in circumstances that we do not know the odds. Recall that if $X_1,\dotsc,X_n$ are independent, with $X_i=1$ or $0$ with probabilities $p_i$ and $q_i=1-p_i$, then we maximize the probability of stopping on the last 1 if we stop at the first 1 that we find amongst $X_s,X_{s+1},\dotsc,X_n$, where $s$ is the greatest integer $i$ such that

$\frac{p_i}{q_i}+\cdots+\frac{p_n}{q_n}\geq 1$.

Now suppose we do not know the actual values of $p_1,\dotsc,p_n$. But we have seen $X_{i-1}=1$ and somehow know that $p_i/q_i+\cdots+p_n/q_n=1$. Can we say that it is optimal to stop? Yes. Since sum-of-odds $=1$ is the borderline case in the odds theorem is it also optimal not to stop? Yes.

But suppose we know only that $E[p_i/q_i+\cdots+p_n/q_n] > 1$. Now this is insufficient information on which to know whether or not it is optimal to stop.

Lecture 12

I wish only to say that controllability and observabliity stand in a dual relationship to one another. This is clear in the necessary and sufficient conditions: that $[B\ AB\ \cdots A^{n-1}B]$ must be rank $n$ (for controllability), and 

$\begin{bmatrix}C\\\ CA\\ \vdots\\ CA^{n-1}\end{bmatrix}$ must be of rank $n$  (for observability). 

This duality is also nicely exhibited in Example Sheet 3 Question 1. You should now be able to do this question.

Notice that a system can be stabilizable without being controllable. E.g. the scalar system with $x_{t+1}=(1/2)x_t$is trivially stabilizable, but it is not controllable.

Monday, February 25, 2013

Who solved the Secretary Problem?

I thought you might enjoy this.

In his paper (Who Solved the Secretary Problem? Statist. Sci., Volume 4, Number 3 (1989), 282-289) Thomas Ferguson tells the following funny story.

"When the celebrated German astronomer, Johannes Kepler (1571-1630), lost his first wife to cholera in 1611, he set about finding a new wife using the same methodical thoroughness and careful consideration of the data that he used in finding the orbit of Mars to be an ellipse. His first, not altogether happy, marriage had been arranged for him, and this time he was determined to make his own decision. In a long letter to a Baron Strahlendorf on October 23, 1613, written after he had made his selection, he describes in great detail the problems he faced and the reasons behind each of the decisions he made. He arranged to interview and to choose from among no fewer than eleven candidates for his hand. The process consumed much of his attention and energy for nearly 2 years, what with the investigations into the virtues and drawbacks of each candidate, her dowry, negotiations with her parents, natural hesitations, the advice of friends, etc. The book of Arthur Koestler (1960) contains an entertaining and insightful exposition of the process. The book of Carola Baumgardt (1951) contains much supplementary information.

Suffice it to say that of the eleven candidates interviewed, Kepler eventually decided on the fifth. It may be noted that when $n = 11$, the function $\phi_n(r)$ of (2.1) takes on its maximum value when $r = 5$. Perhaps, if Kepler had been aware of the theory of the secretary problem, he could have saved himself a lot of time and trouble." Fortunately,
"His new wife, whose education, as he says in his letter, must take the place of a dowry, bore him seven children, ran his household efficiently, and seems to have provided the necessary tranquil homelife for his erratic genius to flourish."
Ferguson has many other interesting things in his paper. It created some controversy and amusement when published because his answer to the question "Who Solved the Secretary Problem?" is "Nobody"! He then goes on to "solve" it for the first time! (at least a particular version he likes). He also says:
"As historians, we should take as the secretary problem, the problem as it first appeared in print, in Martin Gardner's February 1960 column in Scientific American, where it was called the game of googol and described as follows.
Ask someone to take as many slips of paper as he pleases, and on each slip write a different positive number. The numbers may range from small fractions of 1 to a number the size of a googol (1 followed by a hundred 0's) or even larger. These slips are turned face down and shuffled over the top of a table. One at a time you turn the slips face up. The aim is to stop turning when you come to the number that you guess to be the largest of the series. You cannot go back and pick up a previously turned slip. If you turn over all slips, then of course you must pick the last one turned.
The astute reader may notice that this is not the simple form of the secretary problem described in Section 2. The actual values of the numbers are revealed to the decision maker in violation of condition 4. Also, there is this "someone" who chooses the numbers, presumably to make your problem of selecting the largest as difficult as possible. The game of googol is really a two-person game."

Friday, February 22, 2013

A Puzzle

Here is a simple puzzle to develop your understanding and intuition.

You are faced with a problem in which you need to make a decision. For example, you might be choosing $u_0$. You can choose $u_0=0$ or $u_0=1$. One of these is optimal.

There is something "A" which you do not know. It could be either true or false.

If A is true then $u_0=1$ is definitely optimal (i.e. better than $u_0=0$).
If A is false then $u_0=1$ is also definitely optimal.

Can you therefore conclude that $u_0=1$ is optimal?

Thus

Thursday, February 21, 2013

Lecture 11

The idea of controllability is straightforward and we can understand it using ideas of linear algebra.
Consider again the broom balancing problem. It is not possible to stabilize two brooms at equilibrium simultaneously if they are the same lengths, but it is possible if the lengths are different. This is because of the forms:>
$A=\begin{pmatrix}0 & 1 & 0 & 0\\ \alpha & 0 & 0& 0\\ 0 & 0 & 0& 1\\ 0 & 0& \beta & 0\end{pmatrix}\quad B=\begin{pmatrix}0 \\-\alpha\\0\\-\beta\end{pmatrix}\quad M_4=\begin{pmatrix}0 & -\alpha & 0 & -\alpha^2\\ -\alpha & 0 & -\alpha^2& 0\\ 0 & -\beta & 0& -\beta^2\\ -\beta & 0& -\beta^2 & 0\end{pmatrix}$

where $\alpha = g/L_1$, $\beta=g/L_2$.

So $M_4$ is of rank 4 iff a and b are different. However, this assumes that as you move your hand you must keep it at constant height from the ground. What do you think might happen if you can move your hand up and down also?

In 1996, paper 2, question 10 there was this cute tripos question:
A fantasy kingdom has populations of x vampires and y humans. At the start of each of the four yearly seasons the king takes a census and then intervenes to admit or expel some humans. Suppose that the population dynamics of the kingdom are governed by the plant equations:
$x_{t+1} = x_t + (y_t – Y)$,
$y_{t+1} = y_t – (x_t – X) + u_t$,
where $x_t$ and $y_t$ are integers representing the respective populations of vampires and humans in the $t$-th season, $t=0,1,2,3,\dotsc$; $X$ and $Y$ represent equilibrium values in the population without regal intervention; ...
[ some more description]
Can the king regain equilibrium by only expelling humans?
Suppose, the census is taken during the day, so the king can only count humans, and thus $u_t$ is allowed only to depend on $y_t$. Show that ...

The last part of the question is about observability (see Lecture 12). The full question is in this file of past questions

Tuesday, February 19, 2013

Lecture 10

The linear-quadratic regulator (LGR) that we met today is one part of the solution to the linear-quadratic-Gaussian control problem (LQG). This problem is the subject of Lectures 10–13. The LQG problem is perhaps the most important problem in control theory. It has a full solution: given in terms of the Kalman Filter (a linear-quadratic estimator) and the linear-quadratic regulator.

Following today's lecture you can do Questions 6–10 on Example Sheet 2. Here is an important hint. Do not try to solve these problems by plugging in values to the general Riccati equation solution in equations. It is always better to figure out the solution from scratch, by the method of backwards induction. Conjecture for yourself that the optimal value function, $F(x,t)$, is of a form that is quadratic in the state variable, plus some $\gamma_t$ (if it is a problem with white noise). Then find the explicit form of the recurrences, working backwards inductively from a terminal time $h$, Notice how this problem-solving strategy works in Question 9, even when the noise is rather different.

In general, solutions to a Riccati equation can be computerd numerically, but not algebraically. However, one can find an full solution in the special case that $x_t$ and $u_t$ are both one-dimensional (Question 6). There is also a fully-solvable special case in which $x_t$ is two-dimensional and $u_t$ is one-dimensional (Question 10). A useful trick in some problems is to observe that $\Pi_t^{-1}$ satisfies a recurrence relation (Questions 8 and 10). 

I mentioned in the lecture that I would not expect you to memorize the Riccati equation. The important thing is to remember enough of what it is for, how it looks, and how it is derived, so that you could reconstruct its derivation in the context of a particular problem, as you do in Question 6.

Thursday, February 14, 2013

Lecture 9

Since the lecture today I have made some small changes to the proof of Theorem 9.1, as I realized I could give it an even cleaner exposition. (It has been some years since I have thought hard about this.)

I have a personal fondness for some things I talked about today. Theorem 9.1 (and extensions of it to other distributions beyond exponential) was one of the things that I proved in my Ph.D. dissertation in 1980. The Lady's Nylon Stocking Problem was a fairly famous unsolved problem at that time. It gave me something to talk about with non-mathematically inclined colleagues when I first became a research fellow at Queens College. The paper is  Scheduling jobs with stochastic processing requirements on parallel machines to minimize makespan or flow timeJ. Appl. Prob. 19, 167-182, 1982. Looking at it again today, I think that I could write it better.

The method of uniformization that I described in this lecture seems to have been introduced by Jensen (1953), but it was first used extensively in solving queueing problems in the late 1970s (by Grassman (1977) and Keilson (1979)). But it was not well-known. I recall discovering it for myself. Nowadays it is standard. Here is a quote from a recent 2011 paper by Rhonda Righter
Stochastic sequential assignment problem with arrivals, Probability in the Engineering and Informational Sciences, 25, 2011, 477–485:

Proof. We consider the equivalent discrete-time problem by uniformizing with uniformization rate $\lambda + \gamma + \alpha = 1$, and we use value iteration (i.e., induction on a finite time horizon, $n$).

The Wikipedia article has more about it and summarises the method in these words:
The method involves the constructions of an analogous discrete time Markov chain, where transitions occur according to an exponential distribution with the same parameter in every state.
I have used this idea many dozens of times in my own research. It is usually the best first step to make when  tackling a continuous-time Markov decision problem.

Wednesday, February 13, 2013

ABCs of the Bomber Problem

I briefly spoke about the infamous Bomber Problem. If you would like to read more, see ABCs of the Bomber Problem and its Relatives, Weber, 2011. There is also a talk about this here. This problem has been open for over 35 years, and has been said by researchers who know it to be highly additive. It is very infuriating that for a problem with such a very simple dynamic programming equation an "obvious" fact cannot be proved or disproved.

$F(n,t)$ is the probability that a bomber plane, which is potentially attacked by enemy fighters $t$ times, can successfully defend itself against all such attacks, when starting with a supply of $n$ air-to-air missiles. The probability of surviving an attack by using $k$ missiles is $(1-\theta^k)$, where $0<\theta<1$.

$F(n,t) = q F(n,t-1) + p \max_{k\in\{1,\ldots,n\}}(1-\theta^k)F(n-k,t-1)$

$F(n,0)=1$

Conjecture B is that the maximizing $k$ is nondecreasing in $n$.

Many problems very similar to this are easy to solve.

Klinger and Brown's 1968 paper is here.

Tuesday, February 12, 2013

Lecture 8

You will find that the policy improvement algorithm is useful in answering Questions 3 and 4 on Examples Sheet 2. In Question 3, you will find $\lambda$ and $\phi$ for a certain policy ("On seeing a filling station, stop and fill the tank"), and then look for a condition that the policy improvement algorithm will not improve it (or, equivalently, that this $\lambda$ and $\phi$ satisfy the optimality equation.

In Question 4 you will use policy improvement idea in the context of a discounted-cost problem. You find $F(x)$ for a simple policy (in which the engineer allocates his time randomly), and then improve it by a step of the policy improvement algorithm. This leads to a policy in which the engineer puts all his maintenance effort into the machine with greatest value of $c_i(x_i+q_i)/(1-\beta b_i)$. This policy is better, but may not yet be optimal.

In Question 1 you will want to mimic the calculation done in the proof of Bruss's odds algorithm. You cannot solve this by simply applying the algorithm directly.

There is another interesting way to motivate the optimality equationin the average cost case. This can be made rigorous and helps us understand the relative value function $\phi(x)$.

Let $F(x)$ be the infinite-horizon value function when the discount factor is $\beta$. Then we know this satisfies the optimality equation
$F(x) = \min_u \{ c(x,u) +\beta E[ F(x_1) \mid x_0=x,u_0=u ] \}$ .
Pick some state, say state 0. By subtracting $\beta F(0)$ from both sides of the above, we obtain
$F(x) – F(0) + (1–\beta)F(0) = \min_u \{ c(x,u) + β E[ F(x_1) – F(0) \mid x_0=x,u_0=u ] \}$
One can show that as  $\beta\to 1$ we have have $F(x) – F(0) \to \phi(x)$ and $(1–\beta)F(0) \to\lambda$ (the average-cost). Thus we obtain
$\phi(x) + \lambda = \min_u \{ c(x,u) + E[ \phi(x_1) \mid x_0=x,u_0=u ] \}$ 
and this is our average-cost optimality equation. If you would like to understand why $(1–\beta)F(0) \to\lambda$  see this small note about the connection between the average-cost problem and the discounted-cost problem with $\beta$ near 1.


It is also interesting to think about the following (which I mentioned briefly in lectures today). Suppose we have a deterministic stationary Markov policy, say π, with u=f(x). Suppose we have λ and φ such that
φ(x) + λ = c(x,f(x)) + ÎŁy Ď†(y) P(y | x )     (7.6)
where P(y | x) is the transition matrix under π.
Suppose π induces an ergodic Markov chain (i.e. a Markov chain that is irreducible and positive recurrent) and this has invariant distribution μ. We know that
ÎĽy = ÎŁx ÎĽx P( y | x)     (7.7)
Multiplying (7.6) through by ÎĽx and then summing on x, we get
ÎŁx ÎĽx Ď†(x) + λ ÎŁx ÎĽx
= ÎŁx ÎĽc(x,f(x)) + ÎŁy ÎŁx ÎĽx Ď†(y) P( y | x)
which, using (7.7) gives
ÎŁx ÎĽx Ď†(x) + λ ÎŁx ÎĽx = ÎŁx ÎĽc(x,f(x)) + ÎŁy ÎĽy Ď†(y).
Then using ÎŁx ÎĽ= 1, and cancelling the terms that are equal on both sides, we have
λ = ÎŁx ÎĽc(x,f(x))
and so we see that λ is the average-cost of policy π.

Thursday, February 7, 2013

Lecture 7

In today's lecture I used a portion of my slides from a talk on the Gittins Index. You may enjoy looking through the entire talk. It will look better if you download the .pdf file to your computer and read the presentation with a .pdf viewer such as acrobat, rather than try to read within your browser.

The proof of the Gittins index theorem is actually easy, but at the same time deep. It is non-examinable. I would expect you only to know what we mean by a SFABP, the statement of the Gittins index theorem, and how to calculate the indices in simple examples. However, I thought you would enjoy seeing this beautiful result and how it can be proved. Lectures 1-6 have covered everything you need to know in order to understand the Gittins index. Today's lecture has also been an opportunity to revise ideas that we already met in problems on job scheduling and pharmaceutical trials. 

Weiztman's Pandora's boxes problem in 7.5 is cute and something I am talking about for the first time. I may have rushed over it a bit, so refer to the notes. I have there an example in which the prize in box $i$ is $0$ or $r_i$, with probabilities $1-p_i$ and $p_i$. Here's another example. Suppose the prize is uniformly distributed over $[0,r_i]$ and $r_i/2<c_i<r_i$. Then the Gittins index is in the undiscounted case is the solution to

$g_i= -c_i +\int_0^{r_i}\max(g_i,u)(1/r_i)du= -c_i +(g_i^2/r_i+r_i/2)$.

The idea here is that Pandora is indifferent between taking home $g_i$, or opening box $i$, at cost $c_i$, and then taking home the best of either $g_i$ or the prize she finds in the box.

This gives $g_i= r_i-\sqrt{2c_ir_i-r_i^2}$, with $0<g_i<r_i$.

The Gittins index theorem is one of the most beautiful results in the field of Markov decision processes. Its discovery and proof in 1974 is due to John Gittins. The proof I have given in today's lecture is very different to Gittins's original proof. It the simplest way to prove the theorem and was first presented in On the Gittins index for multiarmed banditsAnn. Appl. Prob. 2, 1024-33, 1992.

Tuesday, February 5, 2013

Lecture 6

Today we looked at some stopping problems. They crop up all over the place. They are present in financial mathematics in the theory of option pricing (Black-Scholes). The holder of an American option is allowed to exercise the right to buy (or sell) the underlying asset at a predetermined price at any time before or at the expiry date. The valuation of American options is essentially an optimal stopping problem.

You can read more about Bruss's odds algorithm in his paper: "Sum the odds to one and stop" Annals of Probability, Vol. 28, 1384–1391, (2000). I showed you  how to use this to address a Secretary Problem in which the candidates arrive in groups, of sizes $n_1,n_2,\dotsc,n_h$. I wonder what is the best order to interview the groups? Should we see large groups first, or small groups? I forgot to mention that the optimal success probability when using the odds algorithm is $q_{s^*}\cdots q_n(r_{s^*}+\cdots+r_n)$ and that this is always at least $1/e$, provided $r_1+\cdots+r_n\geq 1$.

Let me finish with some hints about questions on Examples Sheets 1 and 2. Question 6 on Sheet 1 can be solved by realizing that there are only two stationary Markov policies. You simple need to decide which is best. Question 7 is a simple stopping problem that can be solved using a OSLA rule. Question 8 is tricky. You should use Theorem 4.2. 

Question 9 on Sheet 1 is important because it should convince you that it is possible to solve stochastic dynamic programming equations using linear programming (at least if both state space and action space are finite.)  It there are $n$ states and $m$ actions (controls) available in each state then we will have an LP in $n$ variables, with $nm$ linear inequality constraints. 

When on sheet 2 you do Question 2 you might think about using Bruss's Odds Algorithm. Although you cannot apply the algorithm directly, you should be able to solve this problem using the same scheme of proof as we used to prove the optimality of the odds algorithm in this lecture.

Thursday, January 31, 2013

Lecture 5

I have decided to rewrite some parts of the course. The notes will from this point start to differ to those from last year. Please look for new notes appearing on the web site. Examples Sheets 1 and 2 are now as I intend, but I will be revising Sheet 3.

I started today by telling you about a famous problem concerned with searching for a randomly moving hidden object. This is an example of a partially observed Markov decision process. This problem has not yet been solved in full generality. However, the problem has been solved in continuous time. See R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986. The reason the problem is easier in continuous time is that the state $x_1$ (the probability that the object is in location 1) changes continuously, rather than in jumps.

Let me emphasize that in Lecture 4, we used $F_s(x)$ to mean the value function of a MDP over $s$ steps, with $F_0(x)=0$. In the case of $c(x,u)\geq 0$, $F_s(x)$ is clearly nondecreasing in $s$. By contrast, in this lecture, $F_s(x)$ was used to mean the minimum cost in a stopping problem in which we must stop within $s$ steps, and $F_0(x)=k(x)$. $F_s(x)$ is now clearly nonincreasing in $s$. That is because as $s$ increases we are being given more flexibility as to when we must stop. In both cases $F_s(x)$ satisfies an optimality equation and $F_s(x)\to$ a limit, say $F_\infty(x)$, but from a different direction.

Having completed these lectures you can now attempt all of Examples Sheet 1. I will be making some changes to sheets 2 and 3.

Questions 7 and 8 use the idea of a one-step look-ahead (OSLA) rule that we have met in today's lecture. Question 8 is subtle, because although it is easy to guess the answer by thinking about a OSLA rule, how do you prove this answer is correct? Hint: use an appropriate theorem about positive programming. Be careful when writing down the dynamic programming equation for Question 8 that you put the expected value (or integral and density function for an exponential random variable) in the right place relative to where you place your max{ , }).

Questions 9-11 are deeper, but that is mostly as regards their lengthy descriptions. The solutions are not actually hard. In Q9, you'll want to use the idea of substituting an equation into itself. One of the reasons I include these questions is to show you what classifies as a research level problem in this field. I have recently written a paper about Question 11, ABCs of the bomber problem and its relatives Annals of Operations Research, 2011.

Tuesday, January 29, 2013

Lecture 4

One of the CATAM projects this year is Policy Improvement for a Markov Decision Process. The next few lectures (especially 7) are relevant to answering the theoretical parts of this project. Doing this project will help you better understand this course. So I recommend this CATAM project to you.

In Section 4.3 (optimal gambling) we saw that timid play is optimal in the gambling problem when the game is favorable to the gambler ($p \geq 0.5$).

Perhaps you remember the question on the IB Markov Chains examples sheet 1 that begins "A gambler has £2 and needs to increase it to £10 in a hurry. The gambler decides to use a bold strategy in which he stakes all his money if he has £5 or less, and otherwise stakes just enough to increase his capital, if he wins, to £10."

In the case $p \leq 0.5$ bold play maximizes the probability that the gambler reaches N pounds. However, bold play may not be uniquely optimal. An example that exhibits this non-uniqueness of optimal strategy is contained in question 13 of the IB Markov Chains example sheet 1. 

When it was first discovered that bold play is not uniquely optimal, it was contrary to the accepted wisdom at that time (1955). It impressed Leonard Savage when it was presented to him by a Ph.D. student named Lester Dubins They developed a collaboration culminating in the famous monograph How to Gamble if You Must (Inequalities for Stochastic Processes). See the fourth paragraph of this obituary of Lester Dubins.

If $p=0.5$ all strategies are optimal. How could we prove that? Easy. Simply show that given any policy pi the value function is $F(\pi,i)=i/N$ and that this satisfies the DP equation. Then apply Theorem 4.1. You can read more about these so-called red and black games at the Virtual Laboratories in Probability and Statistics.

In Section 4.5 (pharmaceutical trials) I introduced an important class of very practical problems. One obvious generalization is to a problem with $k$ drugs, each of which has a different unknown probability of success, and about which we will learn as we make trials of the drugs. This is called a multi-armed bandit problem. The names comes from thinking about a gaming machine (or fruit machine), having $k$ arms that you might pull, one at a time. The arms have different unknown probabilities of generating payouts. In today's lecture we considered a special case of the two-armed bandit problem in which one of the arms has a known success probability, p, and the second arm has an unknown success probability, theta. I will say more about these problems in Lecture 6. The table on page 16 was computed by value iteration. This table is from the book Multi-armed Bandit Allocation Indices (of which I am one of the authors). The amazon.co.uk page will let you look inside this book.

Thursday, January 24, 2013

Lecture 3

In this lecture we had Theorem 3.1. It had an easy but nontrivial proof. It is important because it states that $F(x)$ satisfies a DP equation (3.7). It holds under the assumption of D (discounted), N (negative) or P (positive) programming. I explained what these three assumptions mean.

I gave an example, $c(x,u) = x$, $a(x,u) = –x$, $\beta = 1$, for which $F(x)$ does not exist. Note that a problem with this data fails to satisfy D, N or P.

Notice that the asset selling example in Section 3.5 (selling a tulip bulb collection) is very much like the secretary problem in Section 2.3. The difference is that now  (i) we observe values (not just relative ranks), (ii) wish to maximize the expected value of the selected candidate (rather than probability of choosing the best), and (iii) the number of offers is infinite, but with a discount factor $\beta$.

The tulip bulb example illustrates that one way in which discounting by a factor beta can naturally arise is when a catastrophe can occur, with probability $1–\beta$, bringing the problem to an end.

Can you figure out the answer to the assess selling problem if past offers for the tulip bulbs remain open (so long as the market has not collapsed)? The state $x_t$ is now be the best of the first $t$ offers.

Monday, January 21, 2013

Lecture 2

The highlight of today's lecture was the Secretary Problem. This is the most famous problem in the field of optimal stopping. It is credited to Merrill M. Flood in 1949, who called it the fiancĂ©e problem. It gained wider publicity when it appeared in Martin Gardner's column of Scientific American in 1960. There is an interesting Wikipedia article about it. One of the interesting things said in this article is that in behavioural studies people tend to stop too soon (i.e. marry too soon, make a purchase too soon). See The devil is in the details: Incorrect intuitions in optimal search.

The optimal solution to the secretary problem can also be solved by applying Bruss's odds-algorithm, due to Thomas Bruss. There is a nice tripos question based on Bruss's odds algorithm in 2011, Paper 2, 29K. The solution to the secretary problem is contained in the last line of this question. You should be able to do it once we have completed Lecture 5.

at LeipzigRW with J. Michael Steele and Thomas Bruss at the 9th German Open Conference on Probability and Statistics, 2010, in Leipzig

Variations of the secretary problem can be created by changing the assumptions about whether or not, in addition to relative ranks, values $X_1,\dotsc, X_h$ can be observed, and whether the objective is to maximize the probability of choosing the best candidate, to maximize the expected rank, or to maximize the expected value of the candidate selected.

A variation that has never been completely solved is the so-called Robbin's Problem. In this problem we do observe values of candidates, say $X_1,\dotsc, X_h$, and these are assumed to be independent, identically distributed uniform$[0,1]$ random variables. The objective is to maximize the expected rank of the candidate that is selected (best = rank 1, second-best = rank 2, etc). It is known only that, as $h$ goes to infinity, the expected rank that can be achieved under an optimal policy lies between 1.908 and 2.329. This problem is much more difficult that the usual secretary problem because the decision as to whether or not to hire candidate t must depend upon all the values of $X_1,\dotsc, X_t$, not just upon how $X_t$ ranks amongst them.

Following this lecture you can do questions 1–4 and 10 on Example Sheet 1. Question 2 is quite like the secretary problem (and also has a surprising answer). The tricks that have been explained in today's lecture are useful in solving these questions (working in terms of time to go, backwards induction, that a bang-bang control arises when the objective in linear in $u_t$, looking at the cross-over between increasing and decreasing terms within a $\max\{ , \}$, as we did in the secretary problem with $\max\{t/h, F(t)\}$).

Thursday, January 17, 2013

Lecture 1

In the first lecture I setup definitions and notation for state, control, history, value function, etc, and develop dynamic programming equations for a very general case, a state-structured case, and the Markov decision process case. Please be patient with the notation. It is not as complex as it may first appear. Things like $a(x,u,t)$, $F(x_t,t)$, $u_t$, and $U_t$ will begin to seem like old friends once you have used them a few times. By the way, in case it was not obvious, $h$ is used for the terminal time because it the time horizon.

One person asked me about the terminology "plant equation" for $x_{t+1}=a(x_,u_t,t)$. This derives from the fact that early optimal control theory was developed with applications to industrial processes in mind, especially chemical plants. You could also just call it the dynamics equation.

From this first lecture you should be taking away the key idea of dynamic programming, and the fact that problems in stages (with separable cost and a finite time horizon) can often be solved by working backwards from the final stage. The minimum length path (stage coach) problem is trivial, but should have made these ideas very intuitive. As an example of a problem in which the optimal policy should be determined by working backwards from the end. You might like to read the Wikipedia entry for Richard Bellman, who is credited with the invention of dynamic programming.

I have just now added some hyperlinks to the recommended booklist. In particular, Demetri Bertsekas has a web page for his book and slides from his lectures. You might find these interesting to browse through at some later stage.

I mentioned that I had once appeared on ITV's Who Wants to be a Millionaire (October 2003). There is a nice Part II exam question on this, including the model solution. You might like to look at this now – simply to see the sort of mathematical problem that this course will enable you to solve. You can also view the overhead slides for a little presentation called A Mathematician Plays "Who Wants to Be a Millionaire?" which I once gave to a Royal Institution workshop for school children. You might like to see how I made best use of my "Ask the audience" lifeline by employing an idea from statistics. Basically, I asked members of the audience not to vote if they felt at all unsure of the right answer.

There is space below for you to write comments or ask questions about things that I said in this lecture. Don't be shy, if you are puzzled about something you are unlikely to be the only one, and we can all learn from some discussion. You can post with your name, or anonymously. If you find a mistake in the notes or have a suggestion for their improvement, please write to me at rrw1@cam.ac.uk

Thursday, January 10, 2013

Course starts January 17, 2013

This is the blog page for the course of 16 lectures starting 17 January 2013 (Tue/Thu @ 9am in CMS meeting room 3).

I will use this space to talk about some extra things. Sometimes upon leaving a lecture I think, "I wish I had said ...". This blog gives me a place to say it. Or I can use this space to talk about a question that a student has asked during a lecture. Or I might comment on an examples sheet question.

There is a space after each post for you to write comments or ask questions using the IntenseDebate commenting system. Please do not be shy to do so. Other students will probably benefit.