The basic idea is dirt simple. After
that, apparently the OP has an overview.
After that, the math gets quite pure.
E.g., the Markov assumption is that in the
underlying process, the past and future
are conditionally independent given the
present. Fairly quickly start discussing
this assumption in terms of sigma algebras
from measure theory instead of just random
variables. The sigma algebras are pretty:
(a) If have the sigma algebra generated by
a random variable, the random variable is
determined (from memory from 20 years ago;
my source was
I. I. Gihman and A. V. Skorohod, The
Theory of Stochastic Processes I, ISBN
0-387-06573-3, Springer-Verlag, New York,
1974.
and I lost my copy in a move and have yet
to order another from Springer). (b) If
have infinitely many random variables,
discussing conditional independence is at
least clumsy but the sigma algebra they
generate makes talking about conditional
independence easy. Where to get
infinitely many random variables? In the
continuous time case, the past one second
will do. In the discrete time case, the
full past of the process if assume the
process goes back all the way in time.
But, sure, in practice, can't do anything
very direct with sigma algebras, that is,
they are pretty theory.
Here is the whole idea in a reasonably
general case in brief:
Do some financial planning for the next
five years. Make time discrete, say, one
point in time for each month. Have a
point at the beginning and also at the end
for a total of 61 points.
Build a spreadsheet with one column for
each of the 61 points in time and one row
for each of the relevant variables. To
make this stochastic optimal control,
Markov decision theory, or other
equivalent names, in the spreadsheet have
some cells that get their values from the
built-in random number generator.
Also at each of the first 60 points in
time, have some variables that are the
decisions get to make.
In the last column, have the value of the
business then.
Our mission, and we have to accept it, is
to determine the decisions (values in the
decision cells) that maximize the value of
the business at the end. Ah, since we
have the random cells, typically we want
to maximize the expected (average) value
at the end.
So, we start at column 60 and consider all
the possible data we could have in column
60. Sure, that data fills maybe a few
exabytes. Then we pick values for the
decisions in column 60 and reevaluate the
spreadsheet many times letting the random
cells generate random numbers, average,
and see what average value we get in
column 61. We do this for all possible
decisions. Sure, now we are keeping busy
a few million square feet of computing.
But the work is highly parallel. That's
what we would about have to do using a
general purpose program, but in practice
we would look at the specific problem,
note special structure (properties,
assumptions, simplifications), and get
some speedups.
To be more clear, at column 60 set aside
the random cells and the decision cells --
the rest have the state at column 60.
So, for each possible state, we try all
the possible decisions, and for each
decision we reevaluate many times and get
the expected value of the business in
column 61 and pick the best decision. So,
there in column 60, for each of the
possible states, we get the best decision
and the corresponding expected business
value in column 61. We store this -- that
is, we store the best decisions and
expected value for each state. Here we
fill some exabytes of storage.
Now we back up to column 59. For each
state we try all the possible decisions.
For each of those decisions we
recalculate, that is, exercise the random
number generators, and see what state we
get in column 60 and the corresponding
business value in column 61. So, for each
state in column 59 we find the best
decisions.
Now we continue in this way back to
columns 58, 57, ..., 1. In column 1 we
know the state we are in because that is
our present. So our calculation for
column 1 told us what decision to make
there in column 1.
Then in reality, soon we get to column 2,
see what state we are in, and use the
decision we tabulated for that state. We
continue with column 3, 4, ..., 60, make
our decisions in column 60, and see what
business value we get in column 61.
The whole thing is the same except much
faster if we have no random cells.
There are also continuous time treatments,
essentially the same intuitively.
For some books,
Dimitri P. Bertsekas, Dynamic
Programming: Deterministic and Stochastic
Models.
George L. Nemhauser, Dynamic
Programming.
Stuart E. Dreyfus and Averill M. Law, The
Art and Theory of Dynamic Programming.
E. B. Dynkin and A. A. Yushkevich,
Controlled Markov Processes.
Wendell H. Fleming and Raymond W. Rishel,
Deterministic and Stochastic Optimal
Control.
There is more from R. T. Rockafellar and
R. J.-B. Wets.
And of course we should mention R.
Bellman, the father of this subject.
Thanks for the explanation! I'm actually working on a tool right now that uses a similar process, but haven't had much of a formal background. Looks like I'll be adding quite a bit to my reading list!
I. I. Gihman and A. V. Skorohod, The Theory of Stochastic Processes I, ISBN 0-387-06573-3, Springer-Verlag, New York, 1974.
and I lost my copy in a move and have yet to order another from Springer). (b) If have infinitely many random variables, discussing conditional independence is at least clumsy but the sigma algebra they generate makes talking about conditional independence easy. Where to get infinitely many random variables? In the continuous time case, the past one second will do. In the discrete time case, the full past of the process if assume the process goes back all the way in time.
But, sure, in practice, can't do anything very direct with sigma algebras, that is, they are pretty theory.
Here is the whole idea in a reasonably general case in brief:
Do some financial planning for the next five years. Make time discrete, say, one point in time for each month. Have a point at the beginning and also at the end for a total of 61 points.
Build a spreadsheet with one column for each of the 61 points in time and one row for each of the relevant variables. To make this stochastic optimal control, Markov decision theory, or other equivalent names, in the spreadsheet have some cells that get their values from the built-in random number generator.
Also at each of the first 60 points in time, have some variables that are the decisions get to make.
In the last column, have the value of the business then.
Our mission, and we have to accept it, is to determine the decisions (values in the decision cells) that maximize the value of the business at the end. Ah, since we have the random cells, typically we want to maximize the expected (average) value at the end.
So, we start at column 60 and consider all the possible data we could have in column 60. Sure, that data fills maybe a few exabytes. Then we pick values for the decisions in column 60 and reevaluate the spreadsheet many times letting the random cells generate random numbers, average, and see what average value we get in column 61. We do this for all possible decisions. Sure, now we are keeping busy a few million square feet of computing. But the work is highly parallel. That's what we would about have to do using a general purpose program, but in practice we would look at the specific problem, note special structure (properties, assumptions, simplifications), and get some speedups.
To be more clear, at column 60 set aside the random cells and the decision cells -- the rest have the state at column 60. So, for each possible state, we try all the possible decisions, and for each decision we reevaluate many times and get the expected value of the business in column 61 and pick the best decision. So, there in column 60, for each of the possible states, we get the best decision and the corresponding expected business value in column 61. We store this -- that is, we store the best decisions and expected value for each state. Here we fill some exabytes of storage.
Now we back up to column 59. For each state we try all the possible decisions. For each of those decisions we recalculate, that is, exercise the random number generators, and see what state we get in column 60 and the corresponding business value in column 61. So, for each state in column 59 we find the best decisions.
Now we continue in this way back to columns 58, 57, ..., 1. In column 1 we know the state we are in because that is our present. So our calculation for column 1 told us what decision to make there in column 1.
Then in reality, soon we get to column 2, see what state we are in, and use the decision we tabulated for that state. We continue with column 3, 4, ..., 60, make our decisions in column 60, and see what business value we get in column 61.
The whole thing is the same except much faster if we have no random cells.
There are also continuous time treatments, essentially the same intuitively.
For some books,
Dimitri P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models.
George L. Nemhauser, Dynamic Programming.
Stuart E. Dreyfus and Averill M. Law, The Art and Theory of Dynamic Programming.
E. B. Dynkin and A. A. Yushkevich, Controlled Markov Processes.
Wendell H. Fleming and Raymond W. Rishel, Deterministic and Stochastic Optimal Control.
There is more from R. T. Rockafellar and R. J.-B. Wets.
And of course we should mention R. Bellman, the father of this subject.