Skip to content

Commit df73957

Browse files
committed
save
1 parent 79d1dca commit df73957

File tree

4 files changed

+18
-7
lines changed

4 files changed

+18
-7
lines changed

lecture0.md

-7
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,6 @@ Lecture 0: Introduction
88
Prof. Gilles Louppe<br>
99
1010

11-
???
12-
13-
AI in medecine https://www.youtube.com/watch?v=AbdVsi1VjQY
14-
15-
Solving tasks
16-
Solving new problems
17-
1811
---
1912

2013
# Today

lecture1.md

+18
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,8 @@ This is the framing we will adopt in this course (starting from Lecture 2).
9898

9999
# Empirical risk minimization
100100

101+
The traditional perspective on supervised learning is empirical risk minimization.
102+
101103
Consider a function $f : \mathcal{X} \to \mathcal{Y}$ produced by some learning algorithm. The predictions
102104
of this function can be evaluated through a loss
103105
$$\ell : \mathcal{Y} \times \mathcal{Y} \to \mathbb{R},$$
@@ -312,6 +314,12 @@ $f\_3(x) = \sum\_{j=0}^{10^4} w\_j x^j$
312314
]
313315
]
314316

317+
???
318+
319+
In this course, we will argue for $f_3$.
320+
321+
Large parameter spaces are not a problem, as long as the capacity of the hypothesis space is controlled. For example, by using stochastic gradient descent, we can optimize $f_3$ without overfitting.
322+
315323
---
316324

317325
class: middle
@@ -402,6 +410,10 @@ In practice, capacity can be controlled through hyper-parameters of the learning
402410
- The number of training iterations;
403411
- Regularization terms.
404412

413+
???
414+
415+
We talk about the capacity of the hypothesis space induced by the learning algorithm (parametric model + optimization algorithm). This is different from the capacity of the model itself.
416+
405417
---
406418

407419
class: middle
@@ -600,6 +612,12 @@ class: middle
600612

601613
.footnote[Credits: [Belkin et al, 2018](https://arxiv.org/abs/1812.11118).]
602614

615+
???
616+
617+
This plot is known as the "double descent" curve. It shows that the test error can decrease as the number of parameters increases, even after the model has enough capacity to fit the training data.
618+
619+
The x-axis is misleading, as the number of parameters is not the same as the capacity.
620+
603621
---
604622

605623
class: middle

pdf/lec0.pdf

0 Bytes
Binary file not shown.

pdf/lec1.pdf

231 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)