Skip to content

Commit 14cd32a

Browse files
committed
lecture 3
1 parent a43e891 commit 14cd32a

5 files changed

+473
-61
lines changed

code/lec2-space-stretching.ipynb

+363-18
Large diffs are not rendered by default.

code/lec2-spiral-classification.ipynb

+83-31
Large diffs are not rendered by default.

lecture2.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ class: middle, center, black-slide
6969

7070
<iframe width="600" height="450" src="https://www.youtube.com/embed/cNxadbrN_aI" frameborder="0" allowfullscreen></iframe>
7171

72-
The machine could classify simple images.
72+
The machine could learn to classify simple images.
7373

7474
---
7575

@@ -767,7 +767,6 @@ $$\nabla \mathcal{\ell}(\theta) =
767767
$$
768768
i.e., a vector that gathers the partial derivatives of the loss for each model parameter $\theta\_k$ for $k=0, \ldots, K-1$.
769769

770-
771770
These derivatives can be evaluated automatically from the *computational graph* of $\ell$ using **automatic differentiation**.
772771

773772
---

lecture3.md

+26-10
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ class: middle
201201

202202
Let us assume a function $\mathbf{f} : \mathbb{R}^n \to \mathbb{R}^m$ that decomposes as a chain composition
203203
$$\mathbf{f} = \mathbf{f}\_t \circ \mathbf{f}\_{t-1} \circ \ldots \circ \mathbf{f}\_1,$$
204-
for functions $\mathbf{f}\_k : \mathbb{R}^{n\_{k-1}} \times \mathbb{R}^{n\_k}$, for $k=1, \ldots, t$.
204+
for functions $\mathbf{f}\_k : \mathbb{R}^{n\_{k-1}} \to \mathbb{R}^{n\_k}$, for $k=1, \ldots, t$.
205205

206206
---
207207

@@ -445,7 +445,6 @@ $$\frac{\partial \mathbf{x}\_t}{\partial \mathbf{x}\_k} = \underbrace{\frac{\par
445445
The Jacobian $\left[ \frac{\partial \mathbf{x}\_m}{\partial \mathbf{x}\_k} \right]$ is never explicitly built. It is usually simpler, faster, and more memory efficient to compute the VJP directly.
446446
- Most reverse mode AD systems compose VJPs backward to compute $\frac{\partial \mathbf{x}\_t}{\partial \mathbf{x}\_1}$.
447447

448-
449448
---
450449

451450
class: middle
@@ -460,6 +459,31 @@ is usually implemented in terms of **Jacobian-vector products** (JVP) locally de
460459

461460
class: middle
462461

462+
## Checkpointing
463+
464+
Checkpointing consists in marking intermediate variables for which the forward values are not stored in memory. These are recomputed during the backward pass, which can save memory at the cost of recomputation.
465+
466+
```python
467+
class MLP(nn.Module):
468+
def __init__(self):
469+
super().__init__()
470+
self.layer1 = nn.Linear(100, 200)
471+
self.relu = nn.ReLU()
472+
self.layer2 = nn.Linear(200, 10)
473+
474+
def forward(self, x):
475+
x = checkpoint(self.layer1, x) # x is not stored
476+
# it will be recomputed
477+
# during the backward pass
478+
x = self.relu(x)
479+
x = self.layer2(x)
480+
return x
481+
```
482+
483+
---
484+
485+
class: middle
486+
463487
## Higher-order derivatives
464488

465489
```python
@@ -518,14 +542,6 @@ Optimizing a wing (Sam Greydanus, 2020)
518542

519543
---
520544

521-
class: middle, center
522-
523-
.width-75[![](figures/lec3/tweet.png)]
524-
525-
... and plenty of other applications! (See this [thread](https://twitter.com/glouppe/status/1361941266901131265))
526-
527-
---
528-
529545
# Summary
530546

531547
- Automatic differentiation is one of the keys that enabled the deep learning revolution.

pdf/lec3.pdf

724 KB
Binary file not shown.

0 commit comments

Comments
 (0)