You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Jacobian $\left[ \frac{\partial \mathbf{x}\_m}{\partial \mathbf{x}\_k} \right]$ is never explicitly built. It is usually simpler, faster, and more memory efficient to compute the VJP directly.
446
446
- Most reverse mode AD systems compose VJPs backward to compute $\frac{\partial \mathbf{x}\_t}{\partial \mathbf{x}\_1}$.
447
447
448
-
449
448
---
450
449
451
450
class: middle
@@ -460,6 +459,31 @@ is usually implemented in terms of **Jacobian-vector products** (JVP) locally de
460
459
461
460
class: middle
462
461
462
+
## Checkpointing
463
+
464
+
Checkpointing consists in marking intermediate variables for which the forward values are not stored in memory. These are recomputed during the backward pass, which can save memory at the cost of recomputation.
465
+
466
+
```python
467
+
classMLP(nn.Module):
468
+
def__init__(self):
469
+
super().__init__()
470
+
self.layer1 = nn.Linear(100, 200)
471
+
self.relu = nn.ReLU()
472
+
self.layer2 = nn.Linear(200, 10)
473
+
474
+
defforward(self, x):
475
+
x = checkpoint(self.layer1, x) # x is not stored
476
+
# it will be recomputed
477
+
# during the backward pass
478
+
x =self.relu(x)
479
+
x =self.layer2(x)
480
+
return x
481
+
```
482
+
483
+
---
484
+
485
+
class: middle
486
+
463
487
## Higher-order derivatives
464
488
465
489
```python
@@ -518,14 +542,6 @@ Optimizing a wing (Sam Greydanus, 2020)
518
542
519
543
---
520
544
521
-
class: middle, center
522
-
523
-
.width-75[]
524
-
525
-
... and plenty of other applications! (See this [thread](https://twitter.com/glouppe/status/1361941266901131265))
526
-
527
-
---
528
-
529
545
# Summary
530
546
531
547
- Automatic differentiation is one of the keys that enabled the deep learning revolution.
0 commit comments