The maximum principle is due to Lev Pontryagin. He was blind and yet one of the greatest mathematicians of his generation.
A key thing to grasp is that the PMP provides necessary conditions. We use the fact that an adjoint trajectory $\lambda$ must exist to deduce properties of, or completely determine, the optimal control and optimally controlled trajectory. To my thinking, the PMP is notoriously badly explained in most books. I hope I have been able to make it seem more intuitive. Lagrange multipliers give a helpful interpretation, as does differentiation of the infinitestimal version of the optimality equation. We may also compare the PMP to the Lagrangian necessity theorem, which says that if $\phi(b)=f(x^*(b))=\max\{f(x):g(x)=b,x\in X\}$ is concave in $b$, then there exists a Lagrange multiplier $\lambda$ such that $f(x)+\lambda^T(b-g(x))$ is maximized with respect to $x\in X$ by $x^*(b)$.
For a crash course in Lagrangian optimization (a reminder of IB) you might like to look at pages 2-3 of these notes by Mike Tehranchi.
The rocket car example is a celebrated problem that was first solved by D.W. Bushaw, Differential Equations with a Discontinuous Forcing Term, PhD Thesis, Princeton, 1952.
In the obituary of Donald W. Bushaw (1926-2012) it is stated that "Don’s PhD thesis is recognized as the beginning of modern optimal control theory."
There is a nice interactive demo of the solution to the rocket car parking problem that you can try.
Some personal experience of the power of PMP came in solving the problem in the paper
R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986.
Here I used PMP to solve in continuous time the problem of searching for a moving object (Section 5.1). This is still an open problem in discrete time.
A key thing to grasp is that the PMP provides necessary conditions. We use the fact that an adjoint trajectory $\lambda$ must exist to deduce properties of, or completely determine, the optimal control and optimally controlled trajectory. To my thinking, the PMP is notoriously badly explained in most books. I hope I have been able to make it seem more intuitive. Lagrange multipliers give a helpful interpretation, as does differentiation of the infinitestimal version of the optimality equation. We may also compare the PMP to the Lagrangian necessity theorem, which says that if $\phi(b)=f(x^*(b))=\max\{f(x):g(x)=b,x\in X\}$ is concave in $b$, then there exists a Lagrange multiplier $\lambda$ such that $f(x)+\lambda^T(b-g(x))$ is maximized with respect to $x\in X$ by $x^*(b)$.
For a crash course in Lagrangian optimization (a reminder of IB) you might like to look at pages 2-3 of these notes by Mike Tehranchi.
The rocket car example is a celebrated problem that was first solved by D.W. Bushaw, Differential Equations with a Discontinuous Forcing Term, PhD Thesis, Princeton, 1952.
In the obituary of Donald W. Bushaw (1926-2012) it is stated that "Don’s PhD thesis is recognized as the beginning of modern optimal control theory."
There is a nice interactive demo of the solution to the rocket car parking problem that you can try.
Some personal experience of the power of PMP came in solving the problem in the paper
R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986.
Here I used PMP to solve in continuous time the problem of searching for a moving object (Section 5.1). This is still an open problem in discrete time.