class: middle, center, title-slide
Lecture 6: Reasoning over time
Prof. Gilles Louppe
[email protected]
Maintain a belief state about the world, and update it as time passes and evidence is collected.
.grid[ .kol-1-2[
- Markov models
- Markov processes
- Inference tasks
- Hidden Markov models
- Filters
.alert[Do not overlook this lecture!]
.footnote[Credits: CS188, UC Berkeley.]
class: middle, black-slide
.center[
.bold[Pacman revenge]: How to make good use of the sonar readings? ]
.footnote[Credits: CS188, UC Berkeley.]
???
python3 run.py --nghosts 2 --layout maze_small
python3 run.py --nghosts 3 --layout maze_medium
python3 run.py --nghosts 4 --layout maze_huge
class: middle
class: middle
We will consider the world as a discrete series of time slices, each of which contains a set of random variables:
-
$\mathbf{X}_t$ denotes the set of unobservable state variables at time$t$ . -
$\mathbf{E}_t$ denotes the set of observable evidence variables at time$t$ .
class: middle
We specify
- a prior
${\bf P}(\mathbf{X}_0)$ that defines our inital belief state over hidden state variables, - a transition model
${\bf P}(\mathbf{X}_t | \mathbf{X}_{0:t-1})$ (for$t > 0$ ) that defines the probability distribution over the latest state variables, given the previous (unobserved) values, - a sensor model
${\bf P}(\mathbf{E}_t | \mathbf{X}_{0:t}, \mathbf{E}_{0:t-1})$ (for$t > 0$ ) that defines the probability distribution over the latest evidence variables, given all previous (observed and unobserved) values.
The current state of the world depends only on its immediate previous state(s), i.e.,
Random processes that satisfy this assumption are called Markov processes or Markov chains.
Markov processes such that
.center.width-100[]
class: middle
We make a (first-order) sensor Markov assumption
The transition and the sensor models are the same for all
.center.width-100[]
A Markov chain coupled with a sensor model can be represented as a growable Bayesian network, unrolled infinitely through time.
The joint distribution of all its variables up to
class: middle
.grid[
.kol-1-2[
.center.width-100[]
]
.kol-1-2[
-
${\bf P}(\text{Umbrella}_t | \text{Rain}_t)$ ? -
${\bf P}(\text{Rain}_t | \text{Umbrella}_{0:t-1})$ ? -
${\bf P}(\text{Rain}_{t+2} | \text{Rain}_{t})$ ? ]]
.footnote[Credits: CS188, UC Berkeley.]
class: middle
The transition model
-
Prediction:
${\bf P}(\mathbf{X}_{t+k}| \mathbf{e}_{1:t})$ for$k>0$ - Computing the posterior distribution over future states.
- Used for evaluation of possible action sequences.
-
Filtering:
${\bf P}(\mathbf{X}_{t}| \mathbf{e}_{1:t})$ - Filtering is what a rational agent does to keep track of the current hidden state
$\mathbf{X}_t$ , its belief state, so that rational decisions can be made.
- Filtering is what a rational agent does to keep track of the current hidden state
-
Smoothing:
${\bf P}(\mathbf{X}_{k}| \mathbf{e}_{1:t})$ for$0 \leq k < t$ - Computing the posterior distribution over past states.
- Used for building better estimates, since it incorporates more evidence.
- Essential for learning.
-
Most likely explanation:
$\arg \max_{\mathbf{x}_{1:t}} P(\mathbf{x}_{1:t}| \mathbf{e}_{1:t})$ - Decoding with a noisy channel, speech recognition, etc.
.grid[
.kol-1-2.center[
.width-80[]
$\begin{aligned} {\bf P}(\mathbf{X}_2) &= \sum_{\mathbf{x}_1} {\bf P}(\mathbf{X}_2, \mathbf{x}_1) \\ &= \sum_{\mathbf{x}_1} P(\mathbf{x}_1) {\bf P}(\mathbf{X}_2 | \mathbf{x}_1) \end{aligned}$
(Predict) Push ]
$\begin{aligned} {\bf P}(\mathbf{X}_1 | \mathbf{e}_1) &=\frac{ {\bf P}(\mathbf{e}_1 | \mathbf{X}_1) {\bf P}(\mathbf{X}_1)}{P(\mathbf{e}_1)} \\ &\propto {\bf P}(\mathbf{e}_1 | \mathbf{X}_1) {\bf P}(\mathbf{X}_1) \end{aligned}$
(Update) Update
.footnote[Credits: CS188, UC Berkeley.]
To predict the future
-
Push the prior belief state
${\bf P}(\mathbf{X}_{t} | \mathbf{e}_{1:t})$ through the transition model:$${\bf P}(\mathbf{X}_{t+1}| \mathbf{e}_{1:t}) = \sum_{\mathbf{x}_{t}} {\bf P}(\mathbf{X}_{t+1} | \mathbf{x}_{t}) P(\mathbf{x}_{t} | \mathbf{e}_{1:t})$$ -
Repeat up to
$t+k$ , using${\bf P}(\mathbf{X}_{t+k-1}| \mathbf{e}_{1:t})$ to compute${\bf P}(\mathbf{X}_{t+k}| \mathbf{e}_{1:t})$ .
.footnote[Credits: CS188, UC Berkeley.]
class: middle, black-slide
.center[
.center[Random dynamics]
.footnote[Credits: CS188, UC Berkeley.]
class: middle, black-slide
.center[
.center[Circular dynamics]
.footnote[Credits: CS188, UC Berkeley.]
class: middle, black-slide
.center[
.center[Whirlpool dynamics]
.footnote[Credits: CS188, UC Berkeley.]
class: middle
As time passes, uncertainty (usually) increases in the absence of new evidence.
.footnote[Credits: CS188, UC Berkeley.]
What if
- For most chains, the influence of the initial distribution gets lesser and lesser over time.
- Eventually, the distribution converges to a fixed point, called a stationary distribution.
- This distribution is such that
$${\bf P}(\mathbf{X}_\infty) = {\bf P}(\mathbf{X}_{\infty+1}) = \sum_{\mathbf{x}_\infty} {\bf P}(\mathbf{X}_{\infty+1} | \mathbf{x}_\infty) P(\mathbf{x}_\infty).$$
class: middle
0.9 | ||
0.1 | ||
0.3 | ||
0.7 |
$ \begin{aligned} P(\mathbf{X}_\infty = \text{sun}) =&, P(\mathbf{X}_{\infty+1} = \text{sun}) \\ =&, P(\mathbf{X}_{\infty+1}=\text{sun} | \mathbf{X}_{\infty}=\text{sun}) P(\mathbf{X}_{\infty}=\text{sun})\\ & + P(\mathbf{X}_{\infty+1}=\text{sun} | \mathbf{X}_{\infty}=\text{rain}) P(\mathbf{X}_{\infty}=\text{rain})\\ =&, 0.9 P(\mathbf{X}_{\infty}=\text{sun}) + 0.3 P(\mathbf{X}_{\infty}=\text{rain}) \end{aligned} $
Therefore,
Which implies that
.center[With new evidence, uncertainty decreases. Beliefs get reweighted. But how?]
???
class: middle
An agent maintains a belief state estimate
This process can be implemented as a recursive Bayesian estimation procedure
- (Predict step): Project the current belief state forward from
$t$ to$t+1$ through the transition model. - (Update step): Update this new state using the evidence
$\mathbf{e}_{t+1}$ .
class: middle
Formally, the Bayes filter is defined as $$ \begin{aligned} {\bf P}(\mathbf{X}_{t+1}| \mathbf{e}_{1:t+1}) &= {\bf P}(\mathbf{X}_{t+1}| \mathbf{e}_{1:t}, \mathbf{e}_{t+1}) \\ &\propto {\bf P}(\mathbf{e}_{t+1}| \mathbf{X}_{t+1}, \mathbf{e}_{1:t}) {\bf P}(\mathbf{X}_{t+1}| \mathbf{e}_{1:t}) \\ &\propto {\bf P}(\mathbf{e}_{t+1}| \mathbf{X}_{t+1}) {\bf P}(\mathbf{X}_{t+1}| \mathbf{e}_{1:t}) \\ &\propto {\bf P}(\mathbf{e}_{t+1}| \mathbf{X}_{t+1}) \sum_{\mathbf{x}_t} {\bf P}(\mathbf{X}_{t+1}|\mathbf{x}_t, \mathbf{e}_{1:t}) P(\mathbf{x}_t | \mathbf{e}_{1:t}) \\ &\propto {\bf P}(\mathbf{e}_{t+1}| \mathbf{X}_{t+1}) \sum_{\mathbf{x}_t} {\bf P}(\mathbf{X}_{t+1}|\mathbf{x}_t) P(\mathbf{x}_t | \mathbf{e}_{1:t}) \end{aligned} $$ where
- the normalization constant
$$Z = P(\mathbf{e}_{t+1} | \mathbf{e}_{1:t}) = \sum_{\mathbf{x}_{t+1}} P(\mathbf{e}_{t+1} | \mathbf{x}_{t+1}) P(\mathbf{x}_{t+1} | \mathbf{e}_{1:t}) $$ is used to make probabilities sum to 1; - in the last expression, the first and second terms are given by the model while the third is obtained recursively.
class: middle
We can think of
Thus, the process can be implemented as
class: middle
.grid[ .kol-1-4[] .kol-1-4.center[
] .kol-1-4.center[
] ]
???
Solve on blackboard.
class: middle, black-slide
.center[
Ghostbusters with a Bayes filter ]
.footnote[Credits: CS188, UC Berkeley.]
???
python3 run.py --nghosts 2 --layout maze_small --agentfile sherlockpacman.py --bsagentfile bayesfilter.py --show True
python3 run.py --nghosts 3 --layout maze_medium --agentfile sherlockpacman.py --bsagentfile bayesfilter.py --show True
python3 run.py --nghosts 4 --layout maze_huge --agentfile sherlockpacman.py --bsagentfile bayesfilter.py --show True
We want to compute
Dividing the evidence
class: middle
Let the backward message
This backward message can be computed using backwards recursion:
The first and last factors are given by the model. The second factor is obtained recursively. Therefore,
class: middle
Complexity:
- Smoothing for a particular time step
$k$ takes:$O(t)$ - Smoothing a whole sequence (because of caching):
$O(t)$
class: middle
???
Solve on blackboard.
.center.width-100[]
.footnote[Credits: CS188, UC Berkeley.]
class: middle
Suppose that
- What is the weather sequence that is the most likely to explain this?
- Among all
$2^5$ sequences, is there an (efficient) way to find the most likely one?
.footnote[Credits: CS188, UC Berkeley.]
class: middle
The most likely sequence is not the sequence of the most likely states!
The most likely path to each
???
Let’s focus in particular on paths that reach the state Rain 5 = true
.
Because of the Markov property, it follows that the most likely path to the state Rain 5 = true
consists of the most likely path to some state at time 4, followed by a transition to Rain 5 = true
; and the state at time 4 that will become part of the path to Rain 5 = true
is whichever maximizes the likelihood of that path.
class: middle
This is identical to filtering, except that
- the forward message
$\mathbf{f}_{1:t} = {\bf P}(\mathbf{X}_t | \mathbf{e}_{1:t})$ is replaced with$$\mathbf{m}_{1:t} = \max_{\mathbf{x}_{1:t-1}} {\bf P}(\mathbf{x}_{1:t-1}, \mathbf{X}_{t} | \mathbf{e}_{1:t}),$$ where$\mathbf{m}_{1:t}(i)$ gives the probability of the most likely path to state$i$ . - The update has its sum replaced by max.
The resulting algorithm is called the Viterbi algorithm, which computes the most likely explanation as
???
Naive procedure: use smoothing to compute
class: middle
???
[Q] How do you retrieve the path, in addition to its likelihood?
Hidden Markov models
So far, we described Markov processes over arbitrary sets of state variables
- A hidden Markov model (HMM) is a Markov process in which the state
$\mathbf{X}_t$ and the evidence$\mathbf{E}_t$ are both single discrete random variables.-
$\mathbf{X}_t = X_t$ , with domain$D_{X_t} = \{1, ..., S\}$ -
$\mathbf{E}_t = E_t$ , with domain$D_{E_t} = \{1, ..., R\}$
-
- This restricted structure allows for a reformulation of the forward-backward algorithm in terms of matrix-vector operations.
class: middle
Some authors instead divide Markov models into two classes, depending on the observability of the system state:
- Observable system state: Markov chains
- Partially-observable system state: Hidden Markov models.
We follow here instead the terminology of the textbook, as defined in the previous slide.
class: middle
- The prior
${\bf P}(X_0)$ becomes a (normalized) column vector$\mathbf{f}_0 \in \mathbb{R}_+^S$ . - The transition model
${\bf P}(X_t | X_{t-1})$ becomes an$S \times S$ transition matrix$\mathbf{T}$ , such that$$\mathbf{T}_{ij} = P(X_t=j | X_{t-1}=i).$$ - The sensor model
${\bf P}(E_t | X_t)$ is defined as an$S \times R$ sensor matrix$\mathbf{B}$ , such that$$\mathbf{B}_{ij} = P(E_t=j | X_t=i).$$
class: middle
- Let the observation matrix
$\mathbf{O}_t$ be a diagonal matrix whose elements corresponds to the column$e_t$ of the sensor matrix$\mathbf{B}$ . - If we use column vectors to represent forward and backward messages, then we have
$$\mathbf{f}_{1:t+1} = \alpha \mathbf{O}_{t+1} \mathbf{T}^T \mathbf{f}_{1:t}$$ $$\mathbf{b}_{k+1:t} = \mathbf{T} \mathbf{O}_{k+1} \mathbf{b}_{k+2:t},$$ where$\mathbf{b}_{t+1:t}$ is an all-one vector of size$S$ . - Therefore the forward-backward algorithm needs time
$O(S^2t)$ and space$O(St)$ .
class: middle
Suppose that
See code/lecture6-forward-backward.ipynb
for the execution.
class: middle
The stationary distribution
class: middle
class: middle
Suppose we want to track the position and velocity of a robot from noisy observations collected over time.
Formally, we want to estimate continuous state variables such as
- the position
$\mathbf{X}_t$ of the robot at time$t$ , - the velocity
$\mathbf{\dot{X}}_t$ of the robot at time$t$ .
We assume discrete time steps.
.footnote[Credits: CS188, UC Berkeley.]
Let
- When
$D_X$ is uncountably infinite (e.g.,$D_X = \mathbb{R}$ ),$X$ is called a continuous random variable. - If
$X$ is absolutely continuous, its probability distribution is described by a density function$p$ that assigns a probability to any interval$[a,b] \subseteq D_X$ such that$$P(a < X \leq b) = \int_a^b p(x) dx,$$ where$p$ is non-negative piecewise continuous and such that$$\int_{D_X} p(x)dx=1.$$
class: middle
The uniform distribution
class: middle
The normal (or Gaussian) distribution
???
Comment that
-
$\mu$ is the location -
$\sigma$ is the width of the normal
class: middle
The multivariate normal distribution generalizes to
class: middle
If
class: middle
If the random variables
The Bayes filter extends to continuous state and evidence variables
The summations are replaced with integrals and the probability mass functions with probability densities, giving the recursive Bayesian relation
$$
\begin{aligned}
p(\mathbf{x}_{t+1}| \mathbf{e}_{1:t+1}) &\propto, p(\mathbf{e}_{t+1}| \mathbf{x}_{t+1}) \int p(\mathbf{x}_{t+1}|\mathbf{x}_t) p(\mathbf{x}_t | \mathbf{e}_{1:t}) d{\mathbf{x}_t},
\end{aligned}
$$
where the normalization constant is
The Kalman filter is a special case of the Bayes filter, which assumes:
- Gaussian prior
- Linear Gaussian transition model
- Linear Gaussian sensor model
Transition model
Sensor model ] ]
class: middle
- .italic[Prediction step:]
If the distribution$p(\mathbf{x}_t | \mathbf{e}_{1:t})$ is Gaussian and the transition model$p(\mathbf{x}_{t+1} | \mathbf{x}_{t})$ is linear Gaussian, then the one-step predicted distribution given by$$p(\mathbf{x}_{t+1} | \mathbf{e}_{1:t}) = \int p(\mathbf{x}_{t+1} | \mathbf{x}_{t}) p(\mathbf{x}_{t} | \mathbf{e}_{1:t}) d\mathbf{x}_t $$ is also a Gaussian distribution. - .italic[Update step:]
If the prediction$p(\mathbf{x}_{t+1} | \mathbf{e}_{1:t})$ is Gaussian and the sensor model$p(\mathbf{e}_{t+1} | \mathbf{x}_{t+1})$ is linear Gaussian, then after conditioning on new evidence, the updated distribution$$p(\mathbf{x}_{t+1} | \mathbf{e}_{1:t+1}) \propto p(\mathbf{e}_{t+1} | \mathbf{x}_{t+1}) p(\mathbf{x}_{t+1} | \mathbf{e}_{1:t})$$ is also a Gaussian distribution.
class: middle
Therefore, for the Kalman filter,
- Filtering reduces to the computation of the parameters
$\mu_t$ and$\mathbf{\Sigma}_t$ . - By contrast, for general (non-linear, non-Gaussian) processes, the description of the posterior grows unboundedly as
$t \to \infty$ .
class: middle
Gaussian random walk:
- Gaussian prior:
$$p(x_0) = \mathcal{N}(x_0 | \mu_0, \sigma_0^2) $$ - The transition model adds random perturbations of constant variance:
$$p(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}|x_t, \sigma_x^2)$$ - The sensor model yields measurements with Gaussian noise of constant variance:
$$p(e_{t}|x_t) = \mathcal{N}(e_t | x_t, \sigma_e^2)$$
class: middle
The one-step predicted distribution is given by $$ \begin{aligned} p(x_1) &= \int p(x_1 | x_0) p(x_0) dx_0 \\ &\propto \int \exp\left(-\frac{1}{2} \frac{(x_{1} - x_0)^2}{\sigma_x^2}\right) \exp\left(-\frac{1}{2} \frac{(x_0 - \mu_0)^2}{\sigma_0^2}\right) dx_0 \\ &\propto \int \exp\left( -\frac{1}{2} \frac{\sigma_0^2 (x_1 - x_0)^2 + \sigma_x^2(x_0 - \mu_0)^2}{\sigma_0^2 \sigma_x^2} \right) dx_0 \\ &... ,, \text{(simplify by completing the square)} \\ &\propto \exp\left( -\frac{1}{2} \frac{(x_1 - \mu_0)^2}{\sigma_0^2 + \sigma_x^2} \right) \\ &= \mathcal{N}(x_1 | \mu_0, \sigma_0^2 + \sigma_x^2) \end{aligned} $$
Note that the same result can be obtained by using instead the Gaussian models identities.
class: middle
For the update step, we need to condition on the observation at the first time step: $$ \begin{aligned} p(x_1 | e_1) &\propto p(e_1 | x_1) p(x_1) \\ &\propto \exp\left(-\frac{1}{2} \frac{(e_{1} - x_1)^2}{\sigma_e^2}\right) \exp\left( -\frac{1}{2} \frac{(x_1 - \mu_0)^2}{\sigma_0^2 + \sigma_x^2} \right) \\ &\propto \exp\left( -\frac{1}{2} \frac{\left(x_1 - \frac{(\sigma_0^2 + \sigma_x^2) e_1 + \sigma_e^2 \mu_0}{\sigma_0^2 + \sigma_x^2 + \sigma_e^2}\right)^2}{\frac{(\sigma_0^2 + \sigma_x^2)\sigma_e^2}{\sigma_0^2 + \sigma_x^2 + \sigma_e^2}} \right) \\ &= \mathcal{N}\left(x_1 \bigg\vert \frac{(\sigma_0^2 + \sigma_x^2) e_1 + \sigma_e^2 \mu_0}{\sigma_0^2 + \sigma_x^2 + \sigma_e^2}, \frac{(\sigma_0^2 + \sigma_x^2)\sigma_e^2}{\sigma_0^2 + \sigma_x^2 + \sigma_e^2}\right) \end{aligned} $$
class: middle
In summary, the update equations given a new evidence
???
We can interpret
the calculation for the new mean
- If the observation is unreliable, then
$\sigma_e^2$ is large and we pay more attention to the old mean; - If the observation is reliable, then we pay more attention to the evidence and less to the old mean.
- if the old mean is unreliable (
$\sigma_t^2$ is large) or the process is highly unpredictable ($\sigma_x^2$ is large), then we pay more attention to the observation
class: middle
The same derivations generalize to multivariate normal distributions.
Assuming the transition and sensor models
$$
\begin{aligned}
p(\mathbf{x}_{t+1} | \mathbf{x}_t) &= \mathcal{N}(\mathbf{x}_{t+1} | \mathbf{F} \mathbf{x}_t, \mathbf{\Sigma}_{\mathbf{x}}) \\
p(\mathbf{e}_{t} | \mathbf{x}_t) &= \mathcal{N}(\mathbf{e}_{t} | \mathbf{H} \mathbf{x}_t, \mathbf{\Sigma}_{\mathbf{e}}),
\end{aligned}
$$
we arrive at the following general update equations:
$$
\begin{aligned}
\mu_{t+1} &= \mathbf{F}\mathbf{\mu}_t + \mathbf{K}_{t+1} (\mathbf{e}_{t+1} - \mathbf{H} \mathbf{F} \mathbf{\mu}_t) \\
\mathbf{\Sigma}_{t+1} &= (\mathbf{I} - \mathbf{K}_{t+1} \mathbf{H}) (\mathbf{F}\mathbf{\Sigma}_t \mathbf{F}^T + \mathbf{\Sigma}_x) \\
\mathbf{K}_{t+1} &= (\mathbf{F}\mathbf{\Sigma}_t \mathbf{F}^T + \mathbf{\Sigma}_x) \mathbf{H}^T (\mathbf{H}(\mathbf{F}\mathbf{\Sigma}_t \mathbf{F}^T + \mathbf{\Sigma}_x)\mathbf{H}^T + \mathbf{\Sigma}_e)^{-1}
\end{aligned}$$
where
???
Note that
These equations intuitively make sense.
Consider
the update for the mean state estimate
- The term
$\mathbf{F}\mathbf{\mu}_t$ is the predicted state at$t + 1$ , - so
$\mathbf{H} \mathbf{F} \mathbf{\mu}_t$ is the predicted observation. - Therefore, the term
$\mathbf{e}_{t+1} - \mathbf{H} \mathbf{F} \mathbf{\mu}_t$ represents the error in the predicted observation. - This is multiplied by $ \mathbf{K}_{t+1}$ to correct the predicted state; hence, $ \mathbf{K}_{t+1}$ is a measure of how seriously to take the new observation relative to the prediction.
class: middle
The Apollo Guidance Computer used a Kalman filter to estimate the position of the spacecraft. The Kalman filter was used to merge new data with past position measurements to produce an optimal position estimate of the spacecraft.
.grid[
.kol-1-3[.width-100[]]
.kol-2-3[.width-100[
]]
]
.footnote[Credits: Apollo-11 source code]
class: middle
.center[Demo: tracking an object in space using the Kalman Filter.]
class: middle
In weather forecasting, filtering is used to combine observations of the atmosphere with numerical models to estimate its current state. This is called data assimilation.
Then, the model is used to predict the future states of the atmosphere.
class: middle, black-slide
.center[
<iframe width="640" height="400" src="https://www.youtube.com/embed/9c4kXW7btBE?cc_load_policy=1&hl=en&version=3" frameborder="0" allowfullscreen></iframe> ].grid[
.kol-2-3[.center.width-100[]]
.kol-1-3[.center.width-80[
]]
]
Dynamics Bayesian networks (DBNs) can be used for tracking multiple variables over time, using multiple sources of evidence. Idea:
- Repeat a fixed Bayes net structure at each time
$t$ . - Variables from time
$t$ condition on those from$t-1$ .
DBNs are a generalization of HMMs and of the Kalman filter.
.footnote[Credits: CS188, UC Berkeley.]
class: middle
class: middle
Unroll the network through time and run any exact inference algorithm (e.g., variable elimination)
- Problem: inference cost for each update grows with
$t$ . - Rollup filtering: add slice
$t+1$ , sum out slice$t$ using variable elimination.- Largest factor is
$O(d^{n+k})$ and the total update cost per step is$O(nd^{n+k})$ . - Better than HMMs, which is
$O(d^{2n})$ , but still infeasible for large numbers of variables.
- Largest factor is
Basic idea:
- Maintain a finite population of samples, called particles.
- The representation of our beliefs is a list of
$N$ particles.
- The representation of our beliefs is a list of
- Ensure the particles track the high-likelihood regions of the state space.
- Throw away samples that have very low weight, according to the evidence.
- Replicate those that have high weight.
This scales to high dimensions!
.footnote[Credits: CS188, UC Berkeley.]
class: middle
.footnote[Credits: CS188, UC Berkeley.]
class: middle
class: middle
.center[(See demo)]
- Temporal models use state and sensor variables replicated over time.
- Their purpose is to maintain a belief state as time passes and as more evidence is collected.
- The Markov and stationarity assumptions imply that we only need to specify
- a transition model
${\bf P}(\mathbf{X}_{t+1} | \mathbf{X}_t)$ , - a sensor model
${\bf P}(\mathbf{E}_t | \mathbf{X}_t)$ .
- a transition model
- Inference tasks include filtering, prediction, smoothing and finding the most likely sequence.
- Filter algorithms are all based on the core of idea of
- projecting the current belief state through the transition model,
- updating the prediction according to the new evidence.
class: end-slide, center count: false
The end.