Thursday, November 27, 2014

Lecture 15

The maximum principle is due to Lev Pontryagin. He was blind and yet one of the greatest mathematicians of his generation.

A key thing to grasp is that the PMP provides necessary conditions. We use the fact that an adjoint trajectory $\lambda$ must exist to deduce properties of, or completely determine, the optimal control and optimally controlled trajectory. To my thinking, the PMP is notoriously badly explained in most books. I hope I have been able to make it seem more intuitive.  Lagrange multipliers give a helpful interpretation, as does differentiation of the infinitestimal version of the optimality equation. We may also compare the PMP to the Lagrangian necessity theorem, which says that if $\phi(b)=f(x^*(b))=\max\{f(x):g(x)=b,x\in X\}$ is concave in $b$, then there exists a Lagrange multiplier $\lambda$ such that $f(x)+\lambda^T(b-g(x))$ is maximized with respect to $x\in X$ by $x^*(b)$.

For a crash course in Lagrangian optimization (a reminder of IB) you might like to look at pages 2-3 of these notes by Mike Tehranchi.

The rocket car example is a celebrated problem that was first solved by D.W. Bushaw, Differential Equations with a  Discontinuous Forcing Term, PhD Thesis, Princeton, 1952.

In the obituary of Donald W. Bushaw (1926-2012) it is stated that "Don’s PhD thesis is recognized as the beginning of modern optimal control theory."

There is a nice interactive demo of the solution to the rocket car parking problem that you can try.

Some personal experience of the power of PMP came in solving the problem in the paper

R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986.

Here I used PMP tp solve in continuous time the problem of searching for a moving object (Section 5.1). This is still an open problem in discrete time.

Wednesday, November 26, 2014

Give your feedback

This year's course finishes on Tuesday, December 2 2014. 

Your initial state of knowledge was $W_0$, and will have been brought to $W_{16}$. It is now your job to control the trajectory onwards. Information about certain unobservables will be revealed by variables $y_1,y_2,y_3$ in the Easter Term. These observations are subject to noise.

As you study and revise from the notes I would appreciate an email from you if you see any typos or grammatical mistakes, or find something that you recommend could be explained better. I try to tread a Goldilocks path in these notes of being neither too verbose nor too succinct. 

You can give the standard feedback on-line here. This is identical to the Faculty's paper form that I will handout on Thursday. It will be sent to my email anonymously. After reading the responses, I will forward them to the Faculty Office. The online form is easy to use and is particularly useful for anyone who is following the course from my notes and is not attending lectures.

Tuesday, November 25, 2014

Lecture 14

On the basis of today's lecture you should be able to do Examples Sheet 3, #3, 4, 5.

I started the lecture with examples of two methods applied to solution of the continuous-time LQR problem.

Method 1 is the solution of this problem using the Hamilton Jacobi Bellman equation.

Method 2 foreshadows the use of Pontryagin's Maximum Principle, which we shall discuss in lectures 15, 16. You will need this for #6, 7, 8, 9, which should all be possible after lecture 15.

Question #10 I will probably do in lecture 16.

In the fishing example we were essentially using Theorem 14.1 to show that an optimal policy is one for which,
$$
u=\begin{cases}
0\\
a(\bar x)\\
u_{\max}
\end{cases}\text{as }\ x\begin{array}{c}
<\\
=\\
>
\end{array}\bar x
$$
and that a necessary condition is that $a'(\bar x)=\alpha$. At the end of page 59 of the notes, $F(x)$ is computed at a function of $T$, the time at which $x$ first becomes equal to $\bar x$. A simple calculation, which I did not do in lecture, is that $F_{xx}<0$, so $F$ is concave. All this information confirms that $F$ satisfies the HJB equation (14.2).

Notice that Figure 2 on page 59 is showing $u_{\max}\geq a(x)$ for all $x$, so fishing at maximum rate does cause the population to decrease (unless initially $a(x)=u_{\max}$).

Monday, November 24, 2014

Kalman filter in economics

The Kalman filter was an important tool in space exploration, and is often mentioned in connection with the Apollo XI guidance system. Let me make some remarks about where Kalman filtering ideas are used, in areas adjacent to operations research, such as economics. 

I once asked Lord Eatwell, President of Queens', and a famous economist, "Is the Kalman filter much used in economics". He immediately replied, "Yes, all the time".

Eatwell is one of the compliers of the Palgrave Dictionary of EconomicsIt is a good place to go if ever you need to find a short article on any economics topic. I searched in Palgrave for the Kalman Filter, and read:

The Kalman filter

The Kalman filter deals with state-space representations where the transition and measurement equations are linear and where the shocks to the system are Gaussian. The procedure was developed by Kalman (1960) to transform (‘filter’) some original observables $y_t$ into Wold innovations at and estimates of the state $x_t$. With the innovations, we can build the likelihood function of the dynamic model. With the estimates of the states, we can forecast and smooth the stochastic process.

The use of unobserved components opens up a new range of possibilities for economic modelling. Furthermore, it provides insights and a unified approach to many other problems. The examples below give a flavour.

The local linear trend model generalizes (1) by the introduction of a stochastic slope, βt, which itself follows a random walk. Thus ...

Econometricians are often interested in building linear models in which some variables are explained by other variables (in some sort of regression model). 

As the values of variables become known over time one wants to update estimates of other variables. The machinery for doing this is provided by the Kalman filter. Notice that the Kalman filter does not have anything to do with the Q assumptions of our LQG model. It is only the Linear Gaussian parts that are relevant.

You might like to try a Google search for "Kalman Filter and finance". Many things turn up. For example, here is a talk, Kalman Filtering in Mathematical Finance

Thursday, November 20, 2014

Lecture 13

The name "Kalman filter" refers to the estimation equation (13.1) and takes its name from Rudolf Kalman (1930 –), who developed it in the years 1958-64. He also coined the terms controllable and observable, and gave the criteria that we have seen in previous lectures. The fact that a system is controllable iff the matrix $[B\ AB\ \cdots\ A^{n-1}B]$ is of full rank is sometimes called Kalman's criteria. In the IEEE biography of Kalman it is stated
The Kalman filter, and its later extensions to nonlinear problems, represents perhaps the most widely applied by-product of modern control theory. It has been used in space vehicle navigation and control (e.g. the Apollo vehicle), radar tracking algorithms for ABM applications, process control, and socioeconomic systems.
The theory in this lecture is admittedly quite tricky - partly because the notation. As a test of memory, can you say what roles in the theory are taken by each of these?

 $x_t$,  $u_t$, $A$, $B$, $\epsilon_t$, $y_t$, $C$, $\eta_t$, $\hat x_t$, $\Delta_t$, $\xi_t$, $\zeta_t$, $R$, $S$, $Q$, $K_t$, $\Pi_t$, $N$, $L$, $M$, $H_t$,  $V_t$. 

 You will understand the ideas better once you have worked through the details of a scalar example (in which $n=m=p=1$). You do this in Example Sheet 3 Question 2. When you do this question, start by supposing that $\hat x_t=\hat x_{t-1}+u_{t-1}+h_t(y_t-\hat x_{t-1})$, and then find the value of $h_t$ that minimizes the variance of $\hat x_t$. You can start by subtracting $x_t=x_{t-1}+u_{t-1}+3\epsilon_t$ and using $y_t=x_{t-1}+2\eta_t$. You get,

$\hat{x}_{t+1}-{x}_{t+1}=\Delta_{t+1}=\Delta_t+2\epsilon_t-h_t\Delta_t+h_t2\eta_t.$

Then square, take the expected value, and minimize with respect to $h_t$ to find a formula for $V_{t+1}$ in terms of $V_t$.

You will not be asked to reproduce the statement or proofs of Theorem 13.1 or 13.2 in examinations. You should simply know that $\hat{x}_t$ is computed from $\hat{x}_{t-1}$ and $y_t$ in the linear manner specified by (13.1), and that the covariance matrix $V_t$ satisfies a Riccati equation. You are not expected to memorize Riccati equations.

Notice that the Riccati equation for $V_t$, i.e. $V_t = g\, V_{t-1}$ runs in the opposite time direction to the one we had for $\Pi_t$ in lecture 10, where $\Pi_{t-1} = f\, \Pi_t$. We are given $V_0$ and $\Pi_h$.

Tuesday, November 18, 2014

Lecture 12

I wish only to say that controllability and observabliity stand in a dual relationship to one another. This is clear in the necessary and sufficient conditions: that $[B\ AB\ \cdots A^{n-1}B]$ must be rank $n$ (for controllability), and 

$\begin{bmatrix}C\\\ CA\\ \vdots\\ CA^{n-1}\end{bmatrix}$ must be of rank $n$  (for observability). 

This duality is also nicely exhibited in the starred question on Example Sheet 3, Question 11. You should now be able to do this question.

Notice that a system can be stabilizable without being controllable. E.g. the scalar system with $x_{t+1}=(1/2)x_t$is trivially stabilizable, but it is not controllable.

Thursday, November 13, 2014

Lecture 11

I showed you a video of what optimal control can do when it is coupled to today's fast microprocessors and feedback controllers. The video is here.

Quadrocopter pole acrobatics

The idea of controllability is straightforward and we can understand it using ideas of linear algebra.

Consider again the broom balancing problem. I mentioned in lectures that it is not possible to stabilize two brooms at equilibrium simultaneously if they are the same lengths, but it is possible if the lengths are different. This is because of the forms:

$A=\begin{pmatrix}0 & 1 & 0 & 0\\ \alpha & 0 & 0& 0\\ 0 & 0 & 0& 1\\ 0 & 0& \beta & 0\end{pmatrix}\quad B=\begin{pmatrix}0 \\-\alpha\\0\\-\beta\end{pmatrix}\quad M_4=\begin{pmatrix}0 & -\alpha & 0 & -\alpha^2\\ -\alpha & 0 & -\alpha^2& 0\\ 0 & -\beta & 0& -\beta^2\\ -\beta & 0& -\beta^2 & 0\end{pmatrix}$

where $\alpha = g/L_1$, $\beta=g/L_2$.

So $M_4$ is of rank 4 iff $\alpha$ and $\beta$ are different. However, this assumes that as you move your hand you must keep it at constant height from the ground. Things might be different if you can move your hand up and down as well.

In 1996, paper 2, question 10 there was this tripos question about controllability:
A fantasy kingdom has populations of x vampires and y humans. At the start of each of the four yearly seasons the king takes a census and then intervenes to admit or expel some humans. Suppose that the population dynamics of the kingdom are governed by the plant equations:
$x_{t+1} = x_t + (y_t – Y)$,
$y_{t+1} = y_t – (x_t – X) + u_t$,
where $x_t$ and $y_t$ are integers representing the respective populations of vampires and humans in the $t$-th season, $t=0,1,2,3,\dotsc$; $X$ and $Y$ represent equilibrium values in the population without regal intervention; ...
[ some more description]
Can the king regain equilibrium by only expelling humans?
Suppose, the census is taken during the day, so the king can only count humans, and thus $u_t$ is allowed only to depend on $y_t$. Show that ...

The last part of the question is about observability (see Lecture 12). The full question is in this file of past questions

Tuesday, November 11, 2014

Lecture 10

The linear-quadratic regulator (LGR) that we met today is one part of the solution to the linear-quadratic-Gaussian control problem (LQG). This problem is the subject of Lectures 10–13. The LQG problem is perhaps the most important problem in control theory. It has a full solution: given in terms of the Kalman Filter (a linear-quadratic estimator) and the linear-quadratic regulator.

Following today's lecture you can do Questions 6–10 on Example Sheet 2. I mostly did question 10 in today's lecture. Here is an important hint. Do not try to solve these problems by plugging in values to the general Riccati equation solution in equations. It is always better to figure out the solution from scratch, by the method of backwards induction. Conjecture for yourself that the optimal value function, $F(x,t)$, is of a form that is quadratic in the state variable, plus some $\gamma_t$ (if it is a problem with white noise). Then find the explicit form of the recurrences, working backwards inductively from a terminal time $h$, 

In general, solutions to a Riccati equation can be computed numerically, but not algebraically. However, one can find an full solution in the special case that $x_t$ and $u_t$ are both one-dimensional (Question 6). There is also a fully-solvable special case in which $x_t$ is two-dimensional and $u_t$ is one-dimensional (Question 10). A useful trick in some problems is to observe that $\Pi_t^{-1}$ satisfies a recurrence relation (Questions 8 and 10). 

I do not expect you to memorize the general form of the Riccati equation for the exams. The important thing is to remember enough of what it is for, how it looks, and how it is derived, so that you could reconstruct its derivation in the context of a particular problem, as you do in Question 6.

Monday, November 10, 2014

Examples sheet 2

Having completed a first supervision on this examples sheet, I have now made some corrections to the questions, and made some changes of wording that I think will help.

I have also taken a hard question from sheet 3 and moved it to the end, as question #11. It is certainly "starred", but might be interesting to discuss, as it points up the duality of the questions of controllability and observability.

Thursday, November 6, 2014

Lecture 9

The slides for the short talk on stochastic scheduling today are here

The method of uniformization that I described in this lecture seems to have been introduced by Jensen (1953), but it was first used extensively in solving queueing problems in the late 1970s (by Grassman (1977) and Keilson (1979)). The Wikipedia article has more about it and summarises the method in these words:
The method involves the constructions of an analogous discrete time Markov chain, where transitions occur according to an exponential distribution with the same parameter in every state.
I have used this idea many dozens of times in my own research. It is usually the best first step to make when  tackling a continuous-time Markov decision problem.

Tuesday, November 4, 2014

Lecture 8

You will find that the policy improvement algorithm is useful in answering Questions 3 and 4 on Examples Sheet 2. In Question 3, you will find $\lambda$ and $\phi$ for a certain policy ("On seeing a filling station, stop and fill the tank"), and then look for a condition that the policy improvement algorithm will not improve it (or, equivalently, that this $\lambda$ and $\phi$ satisfy the optimality equation.

In Question 4 you will use policy improvement idea in the context of a discounted-cost problem. You find $F(x)$ for a simple policy (in which the engineer allocates his time randomly), and then improve it by a step of the policy improvement algorithm. This leads to a policy in which the engineer puts all his maintenance effort into the machine with greatest value of $c_i(x_i+q_i)/(1-\beta b_i)$. This policy is better, but may not yet be optimal.

In Question 1 you will want to mimic the calculation done in the proof of Bruss's odds algorithm. You cannot solve this by simply applying the algorithm directly.


There is another interesting way to motivate the optimality equationin the average cost case. This can be made rigorous and helps us understand the relative value function $\phi(x)$.



Let $F(x)$ be the infinite-horizon value function when the discount factor is $\beta$. Then we know this satisfies the optimality equation
$F(x) = \min_u \{ c(x,u) +\beta E[ F(x_1) \mid x_0=x,u_0=u ] \}$ .
Pick some state, say state 0. By subtracting $\beta F(0)$ from both sides of the above, we obtain
$F(x) – F(0) + (1–\beta)F(0) = \min_u \{ c(x,u) + β E[ F(x_1) – F(0) \mid x_0=x,u_0=u ] \}$
One can show that as  $\beta\to 1$ we have have $F(x) – F(0) \to \phi(x)$ and $(1–\beta)F(0) \to\lambda$ (the average-cost). Thus we obtain
$\phi(x) + \lambda = \min_u \{ c(x,u) + E[ \phi(x_1) \mid x_0=x,u_0=u ] \}$ 
and this is our average-cost optimality equation. If you would like to understand why $(1–\beta)F(0) \to\lambda$  see this small note about the connection between the average-cost problem and the discounted-cost problem with $\beta$ near 1.

It is also interesting to think about the following (which I mentioned briefly in lectures today). Suppose we have a deterministic stationary Markov policy, say π, with u=f(x). Suppose we have λ and φ such that
φ(x) + λ = c(x,f(x)) + Σy φ(y) P(y | x )     (1)
where P(y | x) is the transition matrix under π.
Suppose π induces an ergodic Markov chain (i.e. a Markov chain that is irreducible and positive recurrent) and this has invariant distribution μ. We know that
μy = Σx μx P( y | x)     (2)
Multiplying (1) through by μx and then summing on x, we get
Σx μx φ(x) + λ Σx μx
= Σx μc(x,f(x)) + Σy Σx μx φ(y) P( y | x)
which, using (2) gives
Σx μx φ(x) + λ Σx μx = Σx μc(x,f(x)) + Σy μy φ(y).
Then using Σx μ= 1, and cancelling the terms that are equal on both sides, we have
λ = Σx μc(x,f(x))
and so we see that λ is the average-cost of policy π.