@@ -511,7 +511,7 @@ class: middle
511
511
512
512
The training objective for $s\_ \theta(\mathbf{x}\_ t, t)$ is then a weighted sum of Fisher divergences for all noise levels $t$,
513
513
$$ \sum\_{t=1}^T \lambda(t) \mathbb{E}\_{p\_{t}(\mathbf{x}\_t)} \left[ || \nabla\_{\mathbf{x}\_t} \log p\_{t}(\mathbf{x}\_t) - s\_\theta(\mathbf{x}\_t, t) ||\_2^2 \right] $$
514
- where $\lambda(t)$ is a weighting function that increases with $t$ to give more importance to the noisier samples .
514
+ where $\lambda(t)$ is a weighting function.
515
515
516
516
---
517
517
@@ -530,12 +530,12 @@ class: middle
530
530
## Interpretation 3: Denoising score matching
531
531
532
532
A third interpretation of VDMs can be obtained by reparameterizing $\mathbf{x}\_ 0$ using Tweedie's formula, as
533
- $$ \mathbf{x}\_0 = \frac{\mathbf{x}\_t + (1-\bar{\alpha}\_t) \nabla\_{\mathbf{x}\_t} \log q(\mathbf{x}\_t | \mathbf{x}\_0 ) }{\sqrt{\bar{\alpha}\_t}}, $$
533
+ $$ \mathbf{x}\_0 = \frac{\mathbf{x}\_t + (1-\bar{\alpha}\_t) \nabla\_{\mathbf{x}\_t} \log q(\mathbf{x}\_t) }{\sqrt{\bar{\alpha}\_t}}, $$
534
534
which we can plug into the the mean of the tractable posterior to obtain
535
535
$$ \begin{aligned}
536
536
\mu\_q(\mathbf{x}\_t, \mathbf{x}\_0, t) &= \frac{\sqrt{\alpha\_t}(1-\bar{\alpha}\_{t-1})}{1-\bar{\alpha}\_t}\mathbf{x}\_t + \frac{\sqrt{\bar{\alpha}\_{t-1}}(1-\alpha\_t)}{1-\bar{\alpha}\_t}\mathbf{x}\_0 \\\\
537
537
&= ... \\\\
538
- &= \frac{1}{\sqrt{\alpha}\_t} \mathbf{x}\_t + \frac{1-\alpha\_t}{\sqrt{\alpha\_t}} \nabla\_{\mathbf{x}\_t} \log q(\mathbf{x}\_t | \mathbf{x}\_0 ).
538
+ &= \frac{1}{\sqrt{\alpha}\_t} \mathbf{x}\_t + \frac{1-\alpha\_t}{\sqrt{\alpha\_t}} \nabla\_{\mathbf{x}\_t} \log q(\mathbf{x}\_t).
539
539
\end{aligned} $$
540
540
541
541
???
@@ -552,17 +552,19 @@ $$\mu\_\theta(\mathbf{x}\_t, t) = \frac{1}{\sqrt{\alpha}\_t} \mathbf{x}\_t + \fr
552
552
Under this parameterization, the minimization of the expected KL divergence $L\_ {t-1}$ can be rewritten as
553
553
$$ \begin{aligned}
554
554
&\arg \min\_\theta \mathbb{E}\_{q(\mathbf{x}\_t | \mathbf{x}\_0)}\text{KL}(q(\mathbf{x}\_{t-1}|\mathbf{x}\_t, \mathbf{x}\_0) || p\_\theta(\mathbf{x}\_{t-1} | \mathbf{x}\_t) )\\\\
555
- =&\arg \min\_\theta \mathbb{E}\_{q(\mathbf{x}\_t | \mathbf{x}\_0)} \frac{1}{2\sigma^2\_t} \frac{(1-\alpha\_t)^2}{\alpha\_t} || s\_\theta(\mathbf{x}\_t, t) - \nabla\_{\mathbf{x}\_t} \log q(\mathbf{x}\_t | \mathbf{x}\_0 ) ||_2^2
555
+ =&\arg \min\_\theta \mathbb{E}\_{q(\mathbf{x}\_t | \mathbf{x}\_0)} \frac{1}{2\sigma^2\_t} \frac{(1-\alpha\_t)^2}{\alpha\_t} || s\_\theta(\mathbf{x}\_t, t) - \nabla\_{\mathbf{x}\_t} \log q(\mathbf{x}\_t) ||_2^2
556
556
\end{aligned} $$
557
557
558
- .success[ Optimizing a score-based model amounts to learning a neural network that predicts the score $\nabla\_ {\mathbf{x}\_ t} \log q(\mathbf{x}\_ t | \mathbf{x} \_ 0)$ of the tractable posterior .]
558
+ .success[ Optimizing a score-based model amounts to learning a neural network that predicts the score $\nabla\_ {\mathbf{x}\_ t} \log q(\mathbf{x}\_ t)$ .]
559
559
560
560
---
561
561
562
562
class: middle
563
563
564
- Since $s\_ \theta(\mathbf{x}\_ t, t)$ is learned in expectation over the data distribution $q(\mathbf{x}\_ 0)$, the score network will eventually approximate the score of the marginal distribution $q(\mathbf{x}\_ t$), for each noise level $t$, that is
565
- $$ s\_\theta(\mathbf{x}\_t, t) \approx \nabla\_{\mathbf{x}\_t} \log q(\mathbf{x}\_t). $$
564
+ Unfortunately, $\nabla\_ {\mathbf{x}\_ t} \log q(\mathbf{x}\_ t)$ is not tractable in general.
565
+ However, since $s\_ \theta(\mathbf{x}\_ t, t)$ is learned in expectation over the data distribution $q(\mathbf{x}\_ 0)$, minimizing instead
566
+ $$ \mathbb{E}\_{q(\mathbf{x}\_0)} \mathbb{E}\_{q(\mathbf{x}\_t | \mathbf{x}\_0)} \frac{1}{2\sigma^2\_t} \frac{(1-\alpha\_t)^2}{\alpha\_t} || s\_\theta(\mathbf{x}\_t, t) - \nabla\_{\mathbf{x}\_t} \log q(\mathbf{x}\_t | \mathbf{x}\_0) ||\_2^2 $$
567
+ ensures that $s\_ \theta(\mathbf{x}\_ t, t) \approx \nabla\_ {\mathbf{x}\_ t} \log q(\mathbf{x}\_ t)$.
566
568
567
569
---
568
570
0 commit comments