mirror of
https://github.com/Andreaierardi/Master-DataScience-Notes.git
synced 2024-11-30 15:43:09 +01:00
lecture 3 completed
This commit is contained in:
parent
bfb9d200d9
commit
c829b0fdd0
@ -0,0 +1,140 @@
|
||||
\documentclass[../main.tex]{subfiles}
|
||||
\begin{document}
|
||||
|
||||
\section{Lecture 1 - 09-03-2020}
|
||||
|
||||
\subsection{Introduction}
|
||||
This is time for all good men to come to the aid of their party!
|
||||
|
||||
MACHINE LEARNING
|
||||
In this course we look at the principle behind design of Machine learning.
|
||||
Not just coding but have an idea of algorithm that can work with the data.
|
||||
We have to fix a mathematical framework: some statistic and mathematics.
|
||||
Work on ML on a higher level
|
||||
ML is data inference: make prediction about the future using data about the
|
||||
past
|
||||
Clustering —> grouping according to similarity
|
||||
Planning —> (robot to learn to interact in a certain environment)
|
||||
Classification —> (assign meaning to data) example: Spam filtering
|
||||
I want to predict the outcome of this individual or i want to predict whether a
|
||||
person click or not in a certain advertisement.
|
||||
Examples
|
||||
Classify data into categories:
|
||||
Medical diagnosis: data are medical records and • categories are diseases
|
||||
• Document analysis: data are texts and categories are topics
|
||||
• Image analysts: data are digital images and for categories name of objects
|
||||
in the image (but could be different).
|
||||
• Spam filtering: data are emails, categories are spam vs non spam.
|
||||
• Advertising prediction: data are features of web site visitors and categories
|
||||
could be click/non click on banners.
|
||||
Classification : Different from clustering since we do not have semantically
|
||||
classification (spam or not spam) —> like meaning of the image.
|
||||
I have a semantic label.
|
||||
Clustering: i want to group data with similarity function.
|
||||
Planning: Learning what to do next
|
||||
Clustering: Learn similarity function
|
||||
Classification: Learn semantic labels meaning of data
|
||||
Planning: Learn actions given state
|
||||
In classification is an easier than planning task since I’m able to make
|
||||
prediction telling what is the semantic label that goes with data points.
|
||||
If i can do classification i can clustering.
|
||||
If you do planning you probably classify (since you understanding meaning in
|
||||
your position) and then you can also do clustering probably.
|
||||
We will focus on classification because many tasks are about classification.
|
||||
Classify data in categories we can image a set of categories.
|
||||
For instance the tasks:
|
||||
‘predict income of a person’
|
||||
‘Predict tomorrow price for a stock’
|
||||
The label is a number and not an abstract thing.
|
||||
We can distinguish two cases:
|
||||
The label set —> set of possible categories for each data • point. For each of
|
||||
this could be finite set of abstract symbols (case of document classification,
|
||||
medical diagnosis). So the task is classification.
|
||||
• Real number (no bound on how many of them). My prediction will be a real
|
||||
number and is not a category. In this case we talk about a task of
|
||||
regression.
|
||||
Classification: task we want to give a label predefined point in abstract
|
||||
categories (like YES or NO)
|
||||
Regression: task we want to give label to data points but this label are
|
||||
numbers.
|
||||
When we say prediction task: used both for classification and regression
|
||||
tasks.
|
||||
Supervised learning: Label attached to data (classification, regression)
|
||||
Unsupervised learning: No labels attached to data (clustering)
|
||||
In unsupervised the mathematical modelling and way algorithm are score and
|
||||
can learn from mistakes is a little bit harder. Problem of clustering is harder to
|
||||
model mathematically.
|
||||
You can cast planning as supervised learning: i can show the robot which is
|
||||
the right action to do in that state. But that depends on planning task is
|
||||
formalised.
|
||||
Planning is higher level of learning since include task of supervised and
|
||||
unsupervised learning.
|
||||
Why is this important ?
|
||||
Algorithm has to know how to given the label.
|
||||
In ML we want to teach the algorithm to perform prediction correctly. Initially
|
||||
algorithm will make mistakes in classifying data. We want to tell algorithm that
|
||||
classification was wrong and just want to perform a score. Like giving a grade
|
||||
to the algorithm to understand if it did bad or really bad.
|
||||
So we have mistakes!
|
||||
Algorithm predicts and something makes a mistake —> we can correct it.
|
||||
Then algorithm can be more precisely.
|
||||
We have to define this mistake.
|
||||
Mistakes in case of classification:
|
||||
If category is the wrong one (in the simple case). We • have a binary signal
|
||||
where we know that category is wrong.
|
||||
How to communicate it?
|
||||
We can use the loss function: we can tell the algorithm whether is wrong or
|
||||
not.
|
||||
Loss function: measure discrepancy between ‘true’ label and predicted
|
||||
label.
|
||||
So we may assume that every datapoint has a true label.
|
||||
If we have a set of topic this is the true topic that document is talking about.
|
||||
It is typical in supervised learning.
|
||||
\\\\
|
||||
How good the algorithm did?
|
||||
\\
|
||||
|
||||
\[\ell(y,\hat{y})\leq0 \]
|
||||
|
||||
were $y $ is true label and $\hat{y}$ is predicted label
|
||||
\\\\
|
||||
We want to build a spam filter were $0$ is not spam and $1$ is spam and that
|
||||
Classification task:
|
||||
\\\\
|
||||
$
|
||||
\ell(y,\hat{y} = \begin{cases} 0, & \mbox{if } \hat{y} = y
|
||||
\\ 1, &
|
||||
\mbox{if }\hat{y} \neq y
|
||||
\end{cases}
|
||||
$
|
||||
\\\\
|
||||
The loss function is the “interface” between algorithm and data.
|
||||
So algorithm know about the data through the loss function.
|
||||
If we give a useless loss function the algorithm will not perform good: is
|
||||
important to have a good loss function.
|
||||
Spam filtering
|
||||
We have two main mistakes:
|
||||
It is the same mistake? No if i have important email and you classify as spam
|
||||
that’s bad and if you show me a spam than it’s ok.
|
||||
So we have to assign a different weight.
|
||||
Even in binary classification, mistakes are not equal.
|
||||
e Iotf.TFprIuos.uos
|
||||
True came
|
||||
razee
|
||||
Cussler aircN TASK spam ACG FIRM
|
||||
ftp.y GO
|
||||
IF F Y n is soon
|
||||
IF FEY 0 Nor spam
|
||||
ZERO CNE Cass
|
||||
n n
|
||||
Span No Seamy Binary Classification
|
||||
I 2
|
||||
FALSE PEENE Mistake Y NON SPAM J Spam
|
||||
FN Mistake i f SPAM y NO spam
|
||||
2 IF Fp Meter Airenita
|
||||
f Y F on positive
|
||||
y ye en MISTAKE
|
||||
0 otherwise
|
||||
0 otherwise
|
||||
|
||||
\end{document}
|
@ -0,0 +1,92 @@
|
||||
\section{Lecture 10 - 07-04-2020}
|
||||
\subsection{TO BE DEFINE}
|
||||
|
||||
$|E[z] = |E[|E[z|x]]$
|
||||
\\\\
|
||||
$|E[X] = \sum_{t = 1}^{m} |E[x \Pi(A\begin{small}
|
||||
t \end{small} ) ]$
|
||||
\\\\
|
||||
$x \in \mathbb{R}^d
|
||||
$
|
||||
\\
|
||||
$\mathbb{P}(Y_{\Pi(s,x)} = 1) = \\\\ \mathbb{E}[\Pi { Y_{\Pi(s,x)} = 1 } ] = \\\\
|
||||
= \sum_{t = 1}^{m} \mathbb{E}[\Pi\{Y_t = 1\} \cdot \Pi { Pi(s,x) = t}] = \\\\
|
||||
= \sum_{t = 1}^{m} \mathbb{E}[\mathbb{E}[\Pi\{Y_t = 1\} \cdot \Pi\{\Pi(s,x) = t\} | X_t]] = \\\\
|
||||
given the fact that Y_t \sim \eta(X_t) \Rightarrow give me probability \\
|
||||
Y_t = 1 and \Pi(s,x) = t are independent given X_Y (e. g. \mathbb{E}[Zx] = \mathbb{E}[x] \ast \cdot \mathbb{E}[z]\\\\
|
||||
= \sum_{t = 1}^{m} \barra{E}[\barra{E}[\Pi\{Y_t = 1\}|X_t] \cdot \barra{E} [ \Pi(s,x) = t | Xt]] = \\\\
|
||||
= \sum_{t = 1}^{m} \barra{E}[\eta(X_t) \cdot \Pi \cdot \{\Pi (s,x) = t \}] = \\\\
|
||||
= \barra{E} [ \eta(X_{\Pi(s,x)}]
|
||||
$
|
||||
|
||||
\[ \barra{P} (Y_{\Pi(s,x)}| X=x = \barra{E}[\eta(X_\Pi (s,x))] \]
|
||||
\\\\
|
||||
|
||||
$
|
||||
\barra{P} (Y_{\Pi(s,x)} = 1, y = -1 ) = \\\\
|
||||
= \barra{E}[\Pi\{Y_{\Pi(s,x) }= 1\} \dot \Pi\{Y= -1|X\} ]] = \\\\
|
||||
= \barra{E}[\Pi \{ Y_{\Pi(s,x)} = 1\} \cdot \Pi \{ y = -1 \} ] = \\\\
|
||||
= \barra{E}[\barra{E}[\Pi \{ Y_{\Pi(s,x)} = 1\} \cdot \Pi \{ y = -1 | X \} ]] = \\\\
|
||||
$
|
||||
|
||||
\[ Y_{\Pi(s,x)} = 1 \quad \quad y = -1 (1- \eta(x)) \quad when \quad X = x\]
|
||||
|
||||
$
|
||||
\\\\ = \barra{E}[\barra{E}[\Pi \{Y_\Pi(s,x)\} = 1 | X] \cdot \barra{E}[\Pi \{y = -1\} |X ]] = \\\\
|
||||
= \barra {E}[\eta_{\Pi(s,x)} \cdot (1-\eta(x))] = \\\\
|
||||
similarly: \quad \barra{P}(Y_{\Pi(s,x)} = -1 , y = 1) = \\
|
||||
\barra{E} [(1- \eta_{\Pi(s,x)}) \cdot \eta(x)]
|
||||
\\\\
|
||||
\barra{E} [ \ell_D (\hat{h}_s)] = \barra{P}(Y_{\Pi(s,x)} \neq y ) =
|
||||
\\\\
|
||||
= \barra{P}(Y_{\Pi(s,x)} = 1, y = -1) + \barra{P}(Y_{Pi(s,x)} = -1, y = 1) =
|
||||
\\\\
|
||||
= \barra{E} [\eta_{\Pi(s,x)} \cdot (1-eta(x))] + \barra{E}[( 1- \eta_{\Pi(s,x)})\cdot \eta(x)]$
|
||||
\\\\
|
||||
Make assumptions on $D_x \quad and \quad \eta$: \\
|
||||
|
||||
|
||||
MANCAAAAAAA ROBAAA
|
||||
\\\\
|
||||
|
||||
$
|
||||
\eta(x') <= \eta(x) + c || X-x'|| --> euclidean distance
|
||||
\\\\
|
||||
1-\eta(x') <= 1- \eta(x) + c||X-x'||
|
||||
\\\\
|
||||
$
|
||||
|
||||
|
||||
$
|
||||
X' = X_{Pi(s,x)}
|
||||
\\\\
|
||||
\eta(X) \cdot (1-\eta(x')) + (1-\eta(x))\cdot \eta(x') <=
|
||||
\\\\
|
||||
<= \eta(x) \cdot((1-\eta(x))+\eta(x)\cdot c||X-x'|| + (1-\eta(x))\cdot c||X-x'|| =
|
||||
\\\\
|
||||
= 2 \cdot \eta(x) \cdot (1- \eta(x)) + c||X-x'|| \\\\
|
||||
\barra{E}[\ell_d \cdot (\hat{h}_s)] <= 2 \cdot \barra{E} [\eta(x) - (1-\eta(x))] + c \cdot \barra(E)[||X-x_{\Pi(s,x)}||]
|
||||
$
|
||||
\\ where $<=$ mean at most
|
||||
\\\\
|
||||
Compare risk for zero-one loss
|
||||
\\
|
||||
$
|
||||
\barra{E}[min\{\eta(x),1-\eta(x)\}] = \ell_D (f*)
|
||||
\\\\
|
||||
\eta(x) \cdot( 1- \eta(X)) <= min\{\eta(x), 1-eta(x) \} \quad \forall x
|
||||
\\\\
|
||||
\barra{E}[\eta(x)\cdot(1-\eta(x)] <= \ell_D(f*)
|
||||
\\\\
|
||||
\barra{E}[\ell_d(\hat{l}_s)] <= 2 \cdot \ell_D(f*) + c \cdot \barra{E}[||X-X_{\Pi(s,x)}||]
|
||||
\\\\
|
||||
\eta(x) \in \{0,1\}
|
||||
$
|
||||
\\\\
|
||||
Depends on dimension: curse of dimensionality
|
||||
\\\\--DISEGNO--
|
||||
\\\\
|
||||
$
|
||||
\ell_d(f*) = 0 \iff min\{ \eta(x), 1-\eta(x)\} =0 \quad$ with probability = 1
|
||||
\\
|
||||
to be true $\eta(x) \in \{0,1\}$
|
@ -0,0 +1,195 @@
|
||||
\section{Lecture 2 - 07-04-2020}
|
||||
|
||||
\subsection{Argomento}
|
||||
Classification tasks\\
|
||||
Semantic label space Y\\
|
||||
Categorization Y finite and\\ small
|
||||
Regression Y appartiene ad |R\\
|
||||
How to predict labels?\\
|
||||
Using the lost function $\rightarrow$ ..\\
|
||||
Binary classification\\
|
||||
Label space is Y = { -1, +1 }\\
|
||||
Zero-one loss\\
|
||||
|
||||
$
|
||||
\ell(y,\hat{y} = \begin{cases} 0, & \mbox{if } \hat{y} = y
|
||||
\\ 1, &
|
||||
\mbox{if }\hat{y} \neq y
|
||||
\end{cases}
|
||||
\\\\
|
||||
FP \quad \hat{y} = 1,\quad y = -1\\
|
||||
FN \quad \hat{y} = -1, \quad y = 1
|
||||
$
|
||||
\\\\
|
||||
Losses for regression?\\
|
||||
$y$, and $\hat{y} \in \barra{R}$, \\so they are numbers!\\
|
||||
One example of loss is the absolute loss: absolute difference between numbers\\
|
||||
\subsection{Loss}
|
||||
\subsubsection{Absolute Loss}
|
||||
$$\ell(y,\hat{y} = | y - \hat{y} | \Rightarrow absolute \quad loss\\ $$
|
||||
--- DISEGNO ---\\\\
|
||||
Some inconvenient properties:
|
||||
|
||||
\begin{itemize}
|
||||
\item ...
|
||||
\item Derivative only two values (not much informations)
|
||||
\end{itemize}
|
||||
|
||||
\subsubsection{Square Loss}
|
||||
$$ \ell(y,\hat{y} = ( y - \hat{y} )^2 \Rightarrow \textit{square loss}\\$$
|
||||
-- DISEGNO ---\\
|
||||
Derivative :
|
||||
\begin{itemize}
|
||||
\item more informative
|
||||
\item and differentible
|
||||
\end{itemize}
|
||||
Real numbers as label $\rightarrow$ regression.\\
|
||||
Whenever taking difference between two prediction make sense (value are numbers) then we are talking about regression problem.\\
|
||||
Classification as categorization when we have small finite set.\\\\
|
||||
|
||||
\subsubsection{Example of information of square loss}
|
||||
|
||||
$\ell(y,\hat{y}) = ( y - \hat{y} )^2 = F(y)
|
||||
\\
|
||||
F'(\hat{y}) = -2 \cdot (y-\hat{y})
|
||||
$
|
||||
\begin{itemize}
|
||||
\item I'm under sho or over and how much
|
||||
\item How much far away from the truth
|
||||
\end{itemize}
|
||||
$ \ell(y,\hat{y}) = | y- \hat{y}| = F(y') \cdot F'(y) = Sign (y-\hat{y} )\\\\ $
|
||||
Question about the future\\
|
||||
Will it rain tomorrow?\\
|
||||
We have a label and this is a binary classification problem.\\
|
||||
My label space will be Y = { “rain”, “no rain” }\\
|
||||
We don’t get a binary prediction, we need another space called prediction space (or decision space). \\
|
||||
$
|
||||
Z = [0,1] \\
|
||||
\hat{y} \in Z \qquad \hat{y} \textit{ is my prediction of rain tomorrow}
|
||||
\\
|
||||
\hat{y} = \barra{P} (y = "rain") \quad \rightarrow \textit{my guess is tomorrow will rain (not sure)}\\\\
|
||||
y \in Y \qquad \hat{y} \in Z \\quad \textit{How can we manage loss?}
|
||||
\\
|
||||
\textit{Put numbers in our space}\\
|
||||
\{1,0\} \quad \textit{where 1 is rain and 0 no rain}\\\\
|
||||
$
|
||||
I measure how much I’m far from reality.\\
|
||||
So loss behave like this and the punishment is gonna go linearly??\\
|
||||
\[26..\]\\
|
||||
However is pretty annoying. Sometime I prefer to punish more so i going quadratically instead of linearly.\\
|
||||
There are other way to punish this.\\
|
||||
I called \textbf{logarithmic loss}\\
|
||||
We are extending a lot the range of our loss function.\\
|
||||
|
||||
$$
|
||||
\ell(y,\hat{y}) = | y- \hat{y}| \in |0,1| \qquad \ell(y,\hat{y}) = ( y- \hat{y})^2 \in |0,1|
|
||||
$$
|
||||
\\
|
||||
If i want to expand the punishment i use logarithmic loss\\
|
||||
\\
|
||||
$ \ell(y,\hat{y} = \begin{cases} ln \frac{1}{\hat{y}}, & \mbox{if } y = 1 \textit{(rain)}
|
||||
\\ ln \frac{1}{1-\hat{y}}, &
|
||||
\mbox{if } y = 0 \textit{(no rain}
|
||||
\end{cases}
|
||||
\\\\
|
||||
F(\hat{y}) \rightarrow \textit{can be 0 if i predict with certainty}
|
||||
\\ \textit{If}\quad \hat{y} = 0.5 \qquad \ell(y, \frac{1}{2}) = ln 2 \quad \textit{constant losses in each prediction}\\\\
|
||||
\lim_{\hat{y} \to 0^+}{\ell(1,\hat{y}) = + \infty} \\
|
||||
\textit{We give a vanishing probability not rain but tomorrow will rain.}
|
||||
\\ \textit{So this is } +\infty \\
|
||||
\lim_{\hat{y}\to 1^-} \ell(0,\hat{y}) = + \infty
|
||||
\\\\
|
||||
$
|
||||
The algorithm will be punish high more the prediction is not real. Algorithm will not get 0 and 1 because for example is impossible to get a perfect prediction.\\
|
||||
This loss is useful to give this information to the algorithm.\\\\
|
||||
Now we talk about labels and losses\\
|
||||
\subsubsection{labels and losses}
|
||||
Data points: they have some semantic labels that denote some true about this data points and we want to predict this labels.\\
|
||||
We need to define what data points are: number? Strings? File? Typically they are stored in database records \\
|
||||
They can have very precise structure or more homogeneously structured \\
|
||||
A data point can be viewed as a vector in some d dimensional real space. So it’s a vector of number
|
||||
\\
|
||||
$$
|
||||
\barra{R}^d\\\\
|
||||
X = (x_1,x_2 ..., x_d) \in \barra{R}^c
|
||||
$$
|
||||
\\
|
||||
Image can be viewed as a vector of pixel values (grey scale 0-255).\\
|
||||
I can use geometry to learn because point are in my Euclidean space. Data can be represented as point in Euclidean space. Images are list of pixel that are pretty much the same range and structure (from 0 to 255). It’s very natural to put them in a space.\\\\
|
||||
Assume X can be a record with heterogeneous fields:\\
|
||||
For example medical records, we have several values and each fields has his meaning by it’s own. (Sex, weight, height, age, zip code)\\
|
||||
Each one has a different range, in some cases is numerical but something have like age ..\\
|
||||
Does have any sense to see a medical record as a point since coordinates
|
||||
have different meaning.\\
|
||||
\textbf{Fields are not comparable.}\\
|
||||
This is something that you do: when you want to solve some inference you have to decide which are the label and what is the label space and we have to encode the data points.\\\\
|
||||
Data algorithm expect some homogenous interface.
|
||||
In this case algorithm has to build records with different values of fields.\\
|
||||
This is something that we have to pay attention too.\\
|
||||
You can always each range of values in number. So ages is number, sex you
|
||||
can give 0 and 1, weight number and zip code is number.\\
|
||||
How ever geometry doesn’t make sense since I cannot compare this
|
||||
coordinates.\\
|
||||
Linear space i can sum up as vector: i can make linear combination of
|
||||
vectors.\\
|
||||
Inner product to measure angles! (We will see in linear classifier).\\\\
|
||||
I can scramble the number of my zip code.\\
|
||||
So we get problems with sex and zip code\\\\
|
||||
Why do we care about geometry? I can use geometry to learn.\\
|
||||
However there is more to that, geometry will carry some semantically
|
||||
information that I’m going to preserve during prediction.\\
|
||||
I want to encode my images as vectors in a space. Images with dog.....\\\\
|
||||
PCA doesn’t work because assume we encode in linear space.\\
|
||||
We hope geometry will help us to predict label correctly and sometimes i hard
|
||||
to convert data into geometry point.\\
|
||||
Example of comparable data: images, or documents. \\
|
||||
Assume we have documents with corpus (set of documents).\\
|
||||
Maybe in English and talk about different thing and different words.\\
|
||||
X is a document and i want to encode X into a point fix in bidimensional
|
||||
space.\\
|
||||
There is a way to encode a set of documents in point in a fixed dimensional
|
||||
space in such way it make sense this coordinate are comparable.\\
|
||||
I can represent fields with [0,1] for Neural network for example. But they have no geometrical meaning\\
|
||||
|
||||
\subsubsection{Example TF(idf) documents encoding}
|
||||
TF encoding of docs.
|
||||
\begin{enumerate}
|
||||
\item Extract where all the words from docs
|
||||
\item Normalize words (nouns, adjectives, verbs ...)
|
||||
\item Build a dictionary of normalized words
|
||||
\end{enumerate}
|
||||
Doc $x = (x_1, .., x_d) $\\
|
||||
I associate a coordinate for each word in a dictionary.\\
|
||||
d = number of words in dictionary\\
|
||||
I can decide that \\
|
||||
$x_i = 1 \qquad \textit{If i-th word of dictionary occurs in doc.}\\
|
||||
x_i = 0 \qquad \textit{Else}
|
||||
$\\
|
||||
|
||||
$X_i\quad \textit{number of time i-th word occur in doc.}\\ $
|
||||
Longer documents will have higher value of coordinates that are not zero.\\
|
||||
Now i can do the TF encoding in which xi = frequency with which i-th word
|
||||
occur in dictionary.\\
|
||||
You cannot sum dog and cat but we are considering them frequencies so we
|
||||
are summing frequency of words.\\
|
||||
This encoding works well in real words.\\
|
||||
I can choose different way of encoding my data and sometime i can encode a
|
||||
real vector\\\\
|
||||
I want
|
||||
\begin{enumerate}
|
||||
\item A predictor $f: X \longrightarrow Y$ (in weather $X \longrightarrow Z $
|
||||
\item X is our data space (where points live)
|
||||
\item $X = \barra{R}^d$ images
|
||||
\item $ X = X_1 x ... x X_d$ Medical record
|
||||
\item $\hat{y} = f(x) $ predictor for X
|
||||
\end{enumerate}
|
||||
$(x,y)$\\\\
|
||||
We want to predict a label that is much closer to our label. How?\\
|
||||
Loss function: so this is my setting and is called and example.\\
|
||||
Data point together with label is a “example”\\
|
||||
We can get collection of example making measurements or asking people. So
|
||||
we can always recover the true label.\\
|
||||
We want to replace this process with a predictor (so we don’t have to bored a
|
||||
person).\\
|
||||
y is the ground truth for x $\rightarrow$ mean reality!\\
|
||||
If i want to predict stock for tomorrow, i will wait tomorrow to see the ground truth.
|
@ -0,0 +1,6 @@
|
||||
\relax
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {1}Lecture 3 - 07-04-2020}{1}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {1.1}Overfitting}{3}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {1.1.1}Noise in the data}{3}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {1.2}Underfitting}{5}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {1.3}Nearest neighbour}{5}\protected@file@percent }
|
@ -0,0 +1,5 @@
|
||||
This is BibTeX, Version 0.99dThe top-level auxiliary file: lecture3.aux
|
||||
I found no \citation commands---while reading file lecture3.aux
|
||||
I found no \bibdata command---while reading file lecture3.aux
|
||||
I found no \bibstyle command---while reading file lecture3.aux
|
||||
(There were 3 error messages)
|
@ -0,0 +1,362 @@
|
||||
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 2.9.7300 64-bit) (preloaded format=pdflatex 2020.4.12) 12 APR 2020 15:16
|
||||
entering extended mode
|
||||
**./lecture3.tex
|
||||
(lecture3.tex
|
||||
LaTeX2e <2020-02-02> patch level 2
|
||||
L3 programming layer <2020-02-14>
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/subfiles\subfiles.cls"
|
||||
Document Class: subfiles 2020/02/14 v1.6 Multi-file projects (class)
|
||||
Preamble taken from file `../main.tex'
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/tools\verbatim.sty"
|
||||
Package: verbatim 2019/11/10 v1.5r LaTeX2e package for verbatim enhancements
|
||||
\every@verbatim=\toks14
|
||||
\verbatim@line=\toks15
|
||||
\verbatim@in@stream=\read2
|
||||
)
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/import\import.sty"
|
||||
Package: import 2020/04/01 v 6.2
|
||||
) (../main.tex
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/base\article.cls"
|
||||
Document Class: article 2019/12/20 v1.4l Standard LaTeX document class
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/base\size12.clo"
|
||||
File: size12.clo 2019/12/20 v1.4l Standard LaTeX file (size option)
|
||||
)
|
||||
\c@part=\count167
|
||||
\c@section=\count168
|
||||
\c@subsection=\count169
|
||||
\c@subsubsection=\count170
|
||||
\c@paragraph=\count171
|
||||
\c@subparagraph=\count172
|
||||
\c@figure=\count173
|
||||
\c@table=\count174
|
||||
\abovecaptionskip=\skip47
|
||||
\belowcaptionskip=\skip48
|
||||
\bibindent=\dimen134
|
||||
)
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/amsmath\amsmath.sty"
|
||||
Package: amsmath 2020/01/20 v2.17e AMS math features
|
||||
\@mathmargin=\skip49
|
||||
|
||||
For additional information on amsmath, use the `?' option.
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/amsmath\amstext.sty"
|
||||
Package: amstext 2000/06/29 v2.01 AMS text
|
||||
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/amsmath\amsgen.sty"
|
||||
File: amsgen.sty 1999/11/30 v2.0 generic functions
|
||||
\@emptytoks=\toks16
|
||||
\ex@=\dimen135
|
||||
))
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/amsmath\amsbsy.sty"
|
||||
Package: amsbsy 1999/11/29 v1.2d Bold Symbols
|
||||
\pmbraise@=\dimen136
|
||||
)
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/amsmath\amsopn.sty"
|
||||
Package: amsopn 2016/03/08 v2.02 operator names
|
||||
)
|
||||
\inf@bad=\count175
|
||||
LaTeX Info: Redefining \frac on input line 227.
|
||||
\uproot@=\count176
|
||||
\leftroot@=\count177
|
||||
LaTeX Info: Redefining \overline on input line 389.
|
||||
\classnum@=\count178
|
||||
\DOTSCASE@=\count179
|
||||
LaTeX Info: Redefining \ldots on input line 486.
|
||||
LaTeX Info: Redefining \dots on input line 489.
|
||||
LaTeX Info: Redefining \cdots on input line 610.
|
||||
\Mathstrutbox@=\box45
|
||||
\strutbox@=\box46
|
||||
\big@size=\dimen137
|
||||
LaTeX Font Info: Redeclaring font encoding OML on input line 733.
|
||||
LaTeX Font Info: Redeclaring font encoding OMS on input line 734.
|
||||
\macc@depth=\count180
|
||||
\c@MaxMatrixCols=\count181
|
||||
\dotsspace@=\muskip16
|
||||
\c@parentequation=\count182
|
||||
\dspbrk@lvl=\count183
|
||||
\tag@help=\toks17
|
||||
\row@=\count184
|
||||
\column@=\count185
|
||||
\maxfields@=\count186
|
||||
\andhelp@=\toks18
|
||||
\eqnshift@=\dimen138
|
||||
\alignsep@=\dimen139
|
||||
\tagshift@=\dimen140
|
||||
\tagwidth@=\dimen141
|
||||
\totwidth@=\dimen142
|
||||
\lineht@=\dimen143
|
||||
\@envbody=\toks19
|
||||
\multlinegap=\skip50
|
||||
\multlinetaggap=\skip51
|
||||
\mathdisplay@stack=\toks20
|
||||
LaTeX Info: Redefining \[ on input line 2859.
|
||||
LaTeX Info: Redefining \] on input line 2860.
|
||||
)
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/systeme\systeme.sty"
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/xstring\xstring.sty"
|
||||
("E:\Program Files\MiKTeX 2.9\tex/generic/xstring\xstring.tex"
|
||||
\integerpart=\count187
|
||||
\decimalpart=\count188
|
||||
)
|
||||
Package: xstring 2019/02/06 v1.83 String manipulations (CT)
|
||||
)
|
||||
("E:\Program Files\MiKTeX 2.9\tex/generic/systeme\systeme.tex"
|
||||
\SYS_systemecode=\toks21
|
||||
\SYS_systempreamble=\toks22
|
||||
\SYSeqnum=\count189
|
||||
)
|
||||
Package: systeme 2019/01/13 v0.32 Mise en forme de systemes d'equations (CT)
|
||||
)
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/amsfonts\amssymb.sty"
|
||||
Package: amssymb 2013/01/14 v3.01 AMS font symbols
|
||||
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/amsfonts\amsfonts.sty"
|
||||
Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support
|
||||
\symAMSa=\mathgroup4
|
||||
\symAMSb=\mathgroup5
|
||||
LaTeX Font Info: Redeclaring math symbol \hbar on input line 98.
|
||||
LaTeX Font Info: Overwriting math alphabet `\mathfrak' in version `bold'
|
||||
(Font) U/euf/m/n --> U/euf/b/n on input line 106.
|
||||
))
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/subfiles\subfiles.sty"
|
||||
Package: subfiles 2020/02/14 v1.6 Multi-file projects (package)
|
||||
)))
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/l3backend\l3backend-pdfmode.def"
|
||||
File: l3backend-pdfmode.def 2020-02-03 L3 backend support: PDF mode
|
||||
\l__kernel_color_stack_int=\count190
|
||||
\l__pdf_internal_box=\box47
|
||||
)
|
||||
(lecture3.aux)
|
||||
\openout1 = `lecture3.aux'.
|
||||
|
||||
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 3.
|
||||
LaTeX Font Info: ... okay on input line 3.
|
||||
LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 3.
|
||||
LaTeX Font Info: ... okay on input line 3.
|
||||
LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 3.
|
||||
LaTeX Font Info: ... okay on input line 3.
|
||||
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 3.
|
||||
LaTeX Font Info: ... okay on input line 3.
|
||||
LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 3.
|
||||
LaTeX Font Info: ... okay on input line 3.
|
||||
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 3.
|
||||
LaTeX Font Info: ... okay on input line 3.
|
||||
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 3.
|
||||
LaTeX Font Info: ... okay on input line 3.
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 5--7
|
||||
|
||||
[]
|
||||
|
||||
LaTeX Font Info: Trying to load font information for U+msa on input line 7.
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/amsfonts\umsa.fd"
|
||||
File: umsa.fd 2013/01/14 v3.01 AMS symbols A
|
||||
)
|
||||
LaTeX Font Info: Trying to load font information for U+msb on input line 7.
|
||||
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/amsfonts\umsb.fd"
|
||||
File: umsb.fd 2013/01/14 v3.01 AMS symbols B
|
||||
)
|
||||
Overfull \hbox (12.30455pt too wide) detected at line 9
|
||||
\OML/cmm/m/it/12 x \OT1/cmr/m/n/12 = (\OML/cmm/m/it/12 x[]; :::; x[]\OT1/cmr/m/
|
||||
n/12 ) \OML/cmm/m/it/12 x[] []x \OMS/cmsy/m/n/12 2 \OML/cmm/m/it/12 X[] X \OT1/
|
||||
cmr/m/n/12 = \U/msb/m/n/12 R[] \OML/cmm/m/it/12 X \OT1/cmr/m/n/12 = \OML/cmm/m/
|
||||
it/12 X[] \OMS/cmsy/m/n/12 \OML/cmm/m/it/12 x \OMS/cmsy/m/n/12 \OML/cmm/m/i
|
||||
t/12 ::: \OMS/cmsy/m/n/12 \OML/cmm/m/it/12 X[] \OMS/cmsy/m/n/12 \OML/cmm/m/
|
||||
it/12 x
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 28--35
|
||||
|
||||
[]
|
||||
|
||||
[1
|
||||
|
||||
{C:/Users/AndreDany/AppData/Local/MiKTeX/2.9/pdftex/config/pdftex.map}]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 48--54
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 54--65
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 65--75
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 65--75
|
||||
|
||||
[]
|
||||
|
||||
[2]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 78--86
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 89--96
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 104--108
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 104--108
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 104--108
|
||||
|
||||
[]
|
||||
|
||||
[3]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 110--138
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 110--138
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Overfull \hbox (25.43301pt too wide) in paragraph at lines 110--138
|
||||
[]
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 110--138
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 110--138
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 144--160
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 164--168
|
||||
|
||||
[]
|
||||
|
||||
[4]
|
||||
Overfull \hbox (36.32568pt too wide) detected at line 170
|
||||
\OML/cmm/m/it/12 A \OMS/cmsy/m/n/12 \OML/cmm/m/it/12 ERM[] \OMS/cmsy/m/n/12 !
|
||||
[]
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 177--179
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 184--186
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 187--223
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 187--223
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 187--223
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 187--223
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 187--223
|
||||
|
||||
[]
|
||||
|
||||
[5]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 225--226
|
||||
|
||||
[]
|
||||
|
||||
[6] (lecture3.aux) )
|
||||
Here is how much of TeX's memory you used:
|
||||
2035 strings out of 481556
|
||||
27886 string characters out of 2923622
|
||||
264292 words of memory out of 3000000
|
||||
17282 multiletter control sequences out of 15000+200000
|
||||
539886 words of font info for 51 fonts, out of 3000000 for 9000
|
||||
1141 hyphenation exceptions out of 8191
|
||||
42i,11n,44p,316b,125s stack positions out of 5000i,500n,10000p,200000b,50000s
|
||||
<C:\Users\AndreDany\AppData\Local\MiKTeX\2.9\fonts/pk/ljfo
|
||||
ur/jknappen/ec/dpi600\tcrm1200.pk><E:/Program Files/MiKTeX 2.9/fonts/type1/publ
|
||||
ic/amsfonts/cm/cmbx12.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfo
|
||||
nts/cm/cmex10.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/c
|
||||
mmi12.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmmi8.pfb
|
||||
><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr12.pfb><E:/Prog
|
||||
ram Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr8.pfb><E:/Program Files/
|
||||
MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmsy10.pfb><E:/Program Files/MiKTeX 2
|
||||
.9/fonts/type1/public/amsfonts/cm/cmsy6.pfb><E:/Program Files/MiKTeX 2.9/fonts/
|
||||
type1/public/amsfonts/cm/cmsy8.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/pub
|
||||
lic/amsfonts/cm/cmti12.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsf
|
||||
onts/symbols/msbm10.pfb>
|
||||
Output written on lecture3.pdf (6 pages, 135512 bytes).
|
||||
PDF statistics:
|
||||
70 PDF objects out of 1000 (max. 8388607)
|
||||
0 named destinations out of 1000 (max. 500000)
|
||||
1 words of extra memory for PDF output out of 10000 (max. 10000000)
|
||||
|
Binary file not shown.
Binary file not shown.
@ -0,0 +1,234 @@
|
||||
\documentclass[../main.tex]{subfiles}
|
||||
|
||||
\begin{document}
|
||||
\section{Lecture 3 - 07-04-2020}
|
||||
Data point x represented as sequences of measurement and we called this
|
||||
measurements features or attributes.\\
|
||||
$$ x = (x_1,..., x_d) \qquad x_1 \quad \textit{feature value}
|
||||
x \in X^d \qquad X = \barra{R}^d \qquad X = X_1 \cdot x \cdot ... \cdot X_d \cdot x
|
||||
$$
|
||||
\\
|
||||
$
|
||||
\textit{Label space } Y\\
|
||||
\textit{Predictor } f : X \rightarrow Y \\
|
||||
$
|
||||
\\
|
||||
Example $(x,y)$ \qquad y is the label associated with x\\
|
||||
($ \rightarrow y$ is the correct label, the ground truth)\\
|
||||
\\
|
||||
Learning with example $(x_1,y_1)...(x_m,y_m) \quad \textit{training set} $\\\\
|
||||
Training set is a set of examples with every algorithm can learn.......\\\\
|
||||
Learning algorithm take training set as input and produces a predictor as output.\\\\
|
||||
......DISEGNO \\\\
|
||||
With image recognition we use as measurement pixels.\\
|
||||
How do we measure the power of a predictor?\\
|
||||
A learning algorithm will look at training set, algorithm and generate the predictor. Now the problem is verify the score. \\
|
||||
Now we can consider a test set collection of example
|
||||
\\
|
||||
$$ \textit{Test set} \qquad(x'_1, y'_1)...(x'_n,y'_n) $$
|
||||
Typically we collect big dataset and then we split in training set and test set
|
||||
randomly.\\
|
||||
\textbf{Training and test are typically disjoint}
|
||||
\\
|
||||
How we measure the score of a predictor? We compute the average loss.\\
|
||||
The error is the average loss in the element in the test set.\\
|
||||
$$
|
||||
\textit{Test error }\qquad \frac{1}{n}\cdot \sum_{t=1}^{n} \ell(f(x'_t),y')
|
||||
$$
|
||||
In order to simulate we collect the test set and take the average loss of the predictor of the test set. This will give us idea of how the.. \\
|
||||
Proportion of test and train depends in how big the dataset is in general.
|
||||
Our \textbf{Goal}: A learning algorithm ‘A’ must output f with a small test error.
|
||||
A does not have access to the test set. (Test set is not part of input of A).\\
|
||||
Now we can think in general on how a learning algorithm should be design.
|
||||
We have a training set so algorithm can say:\\
|
||||
\textbf{‘A’ may choose f based on performance on training set.}
|
||||
|
||||
$$
|
||||
\textit{Training error }\qquad \hat{\ell}(f) = \frac{1}{m}\cdot \sum_{t=1}^{m} \ell(f(x_t),y_t)
|
||||
$$
|
||||
Given the training set $(x_1,...,x_m) (y_1,...,y_m)$
|
||||
\\
|
||||
If $\hat{\ell}(f)$ for same f, then test of f is also small
|
||||
\\
|
||||
Fix F set of predictors output $\hat{f}$\\
|
||||
$$ \hat{f} = arg\,min\, \hat{\ell}(f)\\ f \in F $$
|
||||
\\
|
||||
\textbf{This algorithm is called Empirical Risk Minimiser (ERM)}
|
||||
\\
|
||||
When this strategy (ERM) fails?\\
|
||||
ERM may fails if for the given training set there are:\\
|
||||
Many $f \in F$ with small $\hat{\ell}(f)$, but not all of them have small test error
|
||||
\\\\
|
||||
There could be many predictor with small error but some of them may have big test error. Predictor with the smallest training error doesn’t mean we will
|
||||
have the smallest test error.\\
|
||||
I would like to pick $f^*$ such that:
|
||||
$$ f^* = arg\,min \frac{1}{n} \cdot \sum_{t=1}^{m} \ell(f(x'_t),y_t) \\ \qquad f \in F $$
|
||||
where $\ell(f(x'_t),y_t)$ is the test error
|
||||
\\
|
||||
ERM works if $f^* \textit{such that} \qquad f^* = arg\,min\, \hat{\ell}(f)\qquad f \in F$
|
||||
\\
|
||||
So minimising training and test????? Check videolecture\\
|
||||
We can think of f as finite since we are working on a finite computer.\\
|
||||
We want to see why this can happen and we want to formalise a model in
|
||||
which we can avoid this to happen by design:
|
||||
We want when we run ERM choosing a good predictor with ...... PD\\\\
|
||||
|
||||
|
||||
\subsection{Overfitting}
|
||||
We called this as overfitting: specific situation in which ‘A’ (where A is the
|
||||
learning algorithm) overfits if f output by A tends to have a training error much
|
||||
smaller than the test error.\\
|
||||
A is not doing his job (outputting large test error) this happen because test
|
||||
error is misleading.\\
|
||||
Minimising training error doesn’t mean minimising test error. Overfitting is bad.\\
|
||||
Why this happens?\\
|
||||
This happen because we have \textbf{noise in the data}\\
|
||||
|
||||
\subsubsection{Noise in the data}
|
||||
|
||||
Noise in the data: $y_t$ is not deterministically associated with $x_i$.\\\\
|
||||
Could be that datapoint appears more times in the same test set.
|
||||
Same datapoint is repeated actually I’m mislead since training and dataset not
|
||||
coincide.
|
||||
Minimising the training error can take me away from the point that minimise
|
||||
the test error.\\
|
||||
Why this is the case?
|
||||
\begin{itemize}
|
||||
\item Some \textbf{human in the loop}: label assigned by people.(Like image contains
|
||||
certain object but human are not objective and people may have different
|
||||
opinion)
|
||||
\item \textbf{Lack of information}: in weather prediction i want to predict weather error.
|
||||
Weather is determined by a large complicated system. If i have humidity
|
||||
today is difficult to say for sure that tomorrow will rain.
|
||||
\end{itemize}
|
||||
When data are not noise i should be ok.
|
||||
\\
|
||||
\textbf{Labels are not noisy}\\\\\\\\
|
||||
Fix test set and trainign set.
|
||||
$$ \exists f^* \in F \qquad y'_t = f^*(x'_t) \qquad \forall (x'_t,y'_t)\quad \textit{in test set} $$
|
||||
$$ \qquad \qquad \qquad \qquad y_t = f^+(x_t) \qquad \forall (x_t,y_t) \quad \textit{in training set}
|
||||
$$
|
||||
\\
|
||||
Think a problem in which we have 5 data points(vectors) :\\
|
||||
$
|
||||
\vec{x_1},...\vec{x_5} \qquad \textit{in some space X}
|
||||
$
|
||||
\\
|
||||
We have a binary classification problem $Y = \{0,1\}$
|
||||
\\
|
||||
$
|
||||
\{ \vec{x_1},..., \vec{x_5} \} \in X \qquad Y= \{0,1\}\\
|
||||
$
|
||||
\\ $F$ contains all possible calssifier $2^5 = 32 \qquad f: \{x_1,...,x_5\} \rightarrow \{0,1\}
|
||||
$
|
||||
\\\\
|
||||
\begin{tabular}{ |p{2cm}||p{2cm}|p{2cm}|p{2cm}|p{2cm}|p{2cm}| }
|
||||
\hline
|
||||
\multicolumn{6}{|c|}{Example} \\
|
||||
\hline
|
||||
& $x_1$ & $x_2$ & $x_3$ & $x_4$ & $x_5$ \\
|
||||
\hline
|
||||
f &0 &0 & 0 & 0& 0 \\
|
||||
$f^{'}$ &0 &0 & 0 & 0& 1 \\
|
||||
$f^" $ &.. &..& .. &..& .. \\
|
||||
|
||||
\hline
|
||||
\end{tabular}
|
||||
\\\\
|
||||
\[
|
||||
\textit{Training set} \quad {x_1,x_2,x_3} \quad f^+
|
||||
\\
|
||||
\]
|
||||
\[
|
||||
\textit{Test set} \quad {x_4,x_5} \quad f^*
|
||||
\]
|
||||
\\
|
||||
$
|
||||
4 \textit{ classifier } f \in F \qquad \textit{will have } \hat{\ell}(f) = 0
|
||||
\\\\
|
||||
(x_1,0) \quad (x_2,1) \quad (x_3,0) \\
|
||||
(x_4,?) \quad (x_5, ?) \\
|
||||
f^*(x_4) \quad f^*(x_5)
|
||||
$
|
||||
\\
|
||||
If not noise i will have deterministic data but in this example (worst case) we
|
||||
get problem.\\
|
||||
I have 32 classifier to choose: i need a larger training set since i can’t
|
||||
distinguish predictor with small and larger training(?) error.
|
||||
So overfitting noisy or can happen with no noisy but few point in the dataset to
|
||||
define which predictor is good.\\
|
||||
|
||||
|
||||
|
||||
\subsection{Underfitting}
|
||||
‘A’ underfits when f output by A has training error close to test error but they
|
||||
are both large.\\
|
||||
Close error test and training error is good but the are both large.
|
||||
\\
|
||||
$$
|
||||
A \equiv ERM \textit{, then A undefits if F is too small} \rightarrow \textit{not containing too much predictors}
|
||||
$$
|
||||
\\
|
||||
In general, given a certain training set size:
|
||||
\begin{itemize}
|
||||
\item Overfitting when $|F|$ is too large (not enough points in training set)
|
||||
\item Underfitting when $|F|$ is too small
|
||||
\end{itemize}
|
||||
Proportion predictors and training set
|
||||
\\
|
||||
$$
|
||||
|F|, \textit{ i need } ln |F| \quad \textit{bits of info to uniquely determine } f^* \in F
|
||||
$$
|
||||
$$
|
||||
m >> ln |F| \qquad when \quad |F| < \infty \textit{\\ where m is the size of traning set}
|
||||
$$
|
||||
\\
|
||||
\subsection{Nearest neighbour}
|
||||
This is completely different from ERM and is one of the first learning
|
||||
algorithm. This exploit the geometry of the data.
|
||||
Assume that our data space X is:
|
||||
\\
|
||||
$ X \equiv \barra{R}^d \qquad x = (x_1, ..., x_d) \qquad y-\{-1,1\}
|
||||
$
|
||||
\\
|
||||
S is the traning set $(x_1,y_1)...(x_m,y_m) \\ x_t \in \barra{R}^d \qquad y_t \in \{-1,1\} \\\\
|
||||
d = 2 \rightarrow \textit{2-dimensional vector}\\
|
||||
$\\
|
||||
....-- DISEGNO --...
|
||||
\\
|
||||
where + and - are labels
|
||||
\\\\
|
||||
\textbf{Point of test set}
|
||||
\\
|
||||
If i want to predict this point?
|
||||
\\
|
||||
Maybe if point is close to point with label i know then. Maybe they have the same label.
|
||||
\\
|
||||
$\hat{y} = + \quad or \quad \hat{y} = - $
|
||||
\\\\
|
||||
.....-- DISEGNO -- ...
|
||||
\\\
|
||||
I can came up with some sort of classifier.
|
||||
\\\\
|
||||
Given $S$ training set, i can define $h_NN X \rightarrow \{-1,1\}\\
|
||||
$
|
||||
$h_NN(x) = $ label $y_t$ of the point $x_t$ in $S$ closest to $X$\\
|
||||
\textbf{(the breaking rule for ties)}
|
||||
\\
|
||||
For the closest we mean euclidian distance
|
||||
\\
|
||||
$ X = \barra{R}^d
|
||||
\\
|
||||
$
|
||||
$$
|
||||
\| x - x_t \| = \sqrt[] {\sum_{e=1}^{d} (x_e-x_t,e)^2}
|
||||
$$\\
|
||||
$$
|
||||
\hat{\ell}(\hnn) = 0
|
||||
$$
|
||||
$$
|
||||
\hnn (x_t) = y_t
|
||||
$$
|
||||
\\
|
||||
\textbf{training error is 0!}
|
||||
\end{document}
|
@ -0,0 +1 @@
|
||||
\section{Lecture 4 - 07-04-2020}
|
@ -0,0 +1 @@
|
||||
\section{Lecture 5 - 07-04-2020}
|
@ -0,0 +1 @@
|
||||
\section{Lecture 6 - 07-04-2020}
|
@ -0,0 +1 @@
|
||||
\section{Lecture 7 - 07-04-2020}
|
@ -0,0 +1 @@
|
||||
\section{Lecture 8 - 07-04-2020}
|
@ -0,0 +1 @@
|
||||
\section{Lecture 9 - 07-04-2020}
|
@ -1,23 +1,26 @@
|
||||
\relax
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {1}Lecture 1 - 09-03-2020}{2}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {1.1}Introduction}{2}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {paragraph}{Outline}{4}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {2}Lecture 2 - 07-04-2020}{5}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {2.1}Argomento}{5}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {2.2}Loss}{5}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.1}Absolute Loss}{5}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.2}Square Loss}{6}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.3}Example of information of square loss}{6}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.4}labels and losses}{8}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.5}Example TF(idf) documents encoding}{9}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {3}Lecture 3 - 07-04-2020}{11}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {4}Lecture 4 - 07-04-2020}{12}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {5}Lecture 5 - 07-04-2020}{13}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {6}Lecture 6 - 07-04-2020}{14}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {7}Lecture 7 - 07-04-2020}{15}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {8}Lecture 8 - 07-04-2020}{16}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {9}Lecture 9 - 07-04-2020}{17}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {10}Lecture 10 - 07-04-2020}{18}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {10.1}TO BE DEFINE}{18}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {1}Lecture 1 - 09-03-2020}{3}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {1.1}Introduction}{3}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {2}Lecture 2 - 07-04-2020}{6}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {2.1}Argomento}{6}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {2.2}Loss}{6}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.1}Absolute Loss}{6}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.2}Square Loss}{7}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.3}Example of information of square loss}{7}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.4}labels and losses}{9}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.5}Example TF(idf) documents encoding}{10}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {3}Lecture 3 - 07-04-2020}{12}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {3.1}Overfitting}{14}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsubsection}{\numberline {3.1.1}Noise in the data}{14}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {3.2}Underfitting}{16}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {3.3}Nearest neighbour}{16}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {4}Lecture 4 - 07-04-2020}{18}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {5}Lecture 5 - 07-04-2020}{19}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {6}Lecture 6 - 07-04-2020}{20}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {7}Lecture 7 - 07-04-2020}{21}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {8}Lecture 8 - 07-04-2020}{22}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {9}Lecture 9 - 07-04-2020}{23}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {section}{\numberline {10}Lecture 10 - 07-04-2020}{24}\protected@file@percent }
|
||||
\@writefile{toc}{\contentsline {subsection}{\numberline {10.1}TO BE DEFINE}{24}\protected@file@percent }
|
||||
\bibstyle{abbrv}
|
||||
\bibdata{main}
|
||||
|
@ -0,0 +1,3 @@
|
||||
\begin{thebibliography}{}
|
||||
|
||||
\end{thebibliography}
|
@ -0,0 +1,10 @@
|
||||
This is BibTeX, Version 0.99dThe top-level auxiliary file: main.aux
|
||||
The style file: abbrv.bst
|
||||
I couldn't open database file main.bib
|
||||
---line 22 of file main.aux
|
||||
: \bibdata{main
|
||||
: }
|
||||
I'm skipping whatever remains of this command
|
||||
I found no \citation commands---while reading file main.aux
|
||||
I found no database files---while reading file main.aux
|
||||
(There were 3 error messages)
|
@ -1,4 +1,4 @@
|
||||
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 2.9.7300 64-bit) (preloaded format=pdflatex 2020.4.12) 12 APR 2020 12:16
|
||||
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 2.9.7300 64-bit) (preloaded format=pdflatex 2020.4.12) 12 APR 2020 15:12
|
||||
entering extended mode
|
||||
**./main.tex
|
||||
(main.tex
|
||||
@ -105,6 +105,12 @@ LaTeX Font Info: Redeclaring math symbol \hbar on input line 98.
|
||||
LaTeX Font Info: Overwriting math alphabet `\mathfrak' in version `bold'
|
||||
(Font) U/euf/m/n --> U/euf/b/n on input line 106.
|
||||
))
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/subfiles\subfiles.sty"
|
||||
Package: subfiles 2020/02/14 v1.6 Multi-file projects (package)
|
||||
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/import\import.sty"
|
||||
Package: import 2020/04/01 v 6.2
|
||||
))
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/l3backend\l3backend-pdfmode.def"
|
||||
File: l3backend-pdfmode.def 2020-02-03 L3 backend support: PDF mode
|
||||
\l__kernel_color_stack_int=\count190
|
||||
@ -113,265 +119,451 @@ File: l3backend-pdfmode.def 2020-02-03 L3 backend support: PDF mode
|
||||
(main.aux)
|
||||
\openout1 = `main.aux'.
|
||||
|
||||
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 16.
|
||||
LaTeX Font Info: ... okay on input line 16.
|
||||
LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 16.
|
||||
LaTeX Font Info: ... okay on input line 16.
|
||||
LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 16.
|
||||
LaTeX Font Info: ... okay on input line 16.
|
||||
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 16.
|
||||
LaTeX Font Info: ... okay on input line 16.
|
||||
LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 16.
|
||||
LaTeX Font Info: ... okay on input line 16.
|
||||
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 16.
|
||||
LaTeX Font Info: ... okay on input line 16.
|
||||
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 16.
|
||||
LaTeX Font Info: ... okay on input line 16.
|
||||
LaTeX Font Info: Trying to load font information for U+msa on input line 17.
|
||||
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 19.
|
||||
LaTeX Font Info: ... okay on input line 19.
|
||||
LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 19.
|
||||
LaTeX Font Info: ... okay on input line 19.
|
||||
LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 19.
|
||||
LaTeX Font Info: ... okay on input line 19.
|
||||
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 19.
|
||||
LaTeX Font Info: ... okay on input line 19.
|
||||
LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 19.
|
||||
LaTeX Font Info: ... okay on input line 19.
|
||||
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 19.
|
||||
LaTeX Font Info: ... okay on input line 19.
|
||||
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 19.
|
||||
LaTeX Font Info: ... okay on input line 19.
|
||||
LaTeX Font Info: Trying to load font information for U+msa on input line 20.
|
||||
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/amsfonts\umsa.fd"
|
||||
File: umsa.fd 2013/01/14 v3.01 AMS symbols A
|
||||
)
|
||||
LaTeX Font Info: Trying to load font information for U+msb on input line 17.
|
||||
LaTeX Font Info: Trying to load font information for U+msb on input line 20.
|
||||
|
||||
|
||||
("E:\Program Files\MiKTeX 2.9\tex/latex/amsfonts\umsb.fd"
|
||||
File: umsb.fd 2013/01/14 v3.01 AMS symbols B
|
||||
) [1
|
||||
|
||||
{C:/Users/AndreDany/AppData/Local/MiKTeX/2.9/pdftex/config/pdftex.map}]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 29--116
|
||||
{C:/Users/AndreDany/AppData/Local/MiKTeX/2.9/pdftex/config/pdftex.map}] (main.t
|
||||
oc)
|
||||
\tf@toc=\write3
|
||||
\openout3 = `main.toc'.
|
||||
|
||||
[2]
|
||||
(lectures/lecture1.tex
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--96
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 29--116
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--96
|
||||
|
||||
[]
|
||||
|
||||
[2]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 119--158
|
||||
[3]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 99--139
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 119--158
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 99--139
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 119--158
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 99--139
|
||||
|
||||
[]
|
||||
|
||||
[3] [4]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 170--179
|
||||
[4]) [5] (lectures/lecture2.tex
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 4--13
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 180--193
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 14--27
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 180--193
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 14--27
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 195--198
|
||||
|
||||
[]
|
||||
|
||||
[5]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 212--215
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 212--215
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 226--244
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 29--32
|
||||
|
||||
[]
|
||||
|
||||
[6]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 244--249
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 46--49
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 252--272
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 46--49
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 252--272
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 252--272
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 273--278
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 60--78
|
||||
|
||||
[]
|
||||
|
||||
[7]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 281--319
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 78--83
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 281--319
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 86--106
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 281--319
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 86--106
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 281--319
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 86--106
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 281--319
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 281--319
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 107--112
|
||||
|
||||
[]
|
||||
|
||||
[8]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 327--334
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 115--153
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 115--153
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 115--153
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 115--153
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 115--153
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 115--153
|
||||
|
||||
[]
|
||||
|
||||
[9]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 335--345
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 161--168
|
||||
|
||||
[]
|
||||
|
||||
[10]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 169--179
|
||||
|
||||
[]
|
||||
|
||||
)
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 186--33
|
||||
|
||||
[]
|
||||
|
||||
[11] (lectures/lecture3.tex
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 5--7
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 352--362
|
||||
|
||||
Overfull \hbox (12.30455pt too wide) detected at line 9
|
||||
\OML/cmm/m/it/12 x \OT1/cmr/m/n/12 = (\OML/cmm/m/it/12 x[]; :::; x[]\OT1/cmr/m/
|
||||
n/12 ) \OML/cmm/m/it/12 x[] []x \OMS/cmsy/m/n/12 2 \OML/cmm/m/it/12 X[] X \OT1/
|
||||
cmr/m/n/12 = \U/msb/m/n/12 R[] \OML/cmm/m/it/12 X \OT1/cmr/m/n/12 = \OML/cmm/m/
|
||||
it/12 X[] \OMS/cmsy/m/n/12 \OML/cmm/m/it/12 x \OMS/cmsy/m/n/12 \OML/cmm/m/i
|
||||
t/12 ::: \OMS/cmsy/m/n/12 \OML/cmm/m/it/12 X[] \OMS/cmsy/m/n/12 \OML/cmm/m/
|
||||
it/12 x
|
||||
[]
|
||||
|
||||
[10] [11] [12] [13] [14] [15] [16] [17]
|
||||
|
||||
LaTeX Font Warning: Command \small invalid in math mode on input line 398.
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 396--413
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 396--413
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 414--416
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 414--416
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 417--423
|
||||
|
||||
[]
|
||||
|
||||
[18]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 426--439
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 426--439
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 441--443
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 9--28
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 441--443
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 28--35
|
||||
|
||||
[]
|
||||
|
||||
[12]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 48--54
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 444--450
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 54--65
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 452--485
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 65--75
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 452--485
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 65--75
|
||||
|
||||
[]
|
||||
|
||||
[13]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 78--86
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 452--485
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 89--96
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 452--485
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 104--108
|
||||
|
||||
[]
|
||||
|
||||
[19]
|
||||
No file main.bbl.
|
||||
[20] (main.aux) )
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 104--108
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 104--108
|
||||
|
||||
[]
|
||||
|
||||
[14]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 110--138
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 110--138
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Overfull \hbox (25.43301pt too wide) in paragraph at lines 110--138
|
||||
[]
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 110--138
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 110--138
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 144--160
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 164--168
|
||||
|
||||
[]
|
||||
|
||||
[15]
|
||||
Overfull \hbox (36.32568pt too wide) detected at line 170
|
||||
\OML/cmm/m/it/12 A \OMS/cmsy/m/n/12 \OML/cmm/m/it/12 ERM[] \OMS/cmsy/m/n/12 !
|
||||
[]
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 177--179
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 184--186
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 187--223
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 187--223
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 187--223
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 187--223
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 187--223
|
||||
|
||||
[]
|
||||
|
||||
[16]) [17] (lectures/lecture4.tex) [18] (lectures/lecture5.tex) [19]
|
||||
(lectures/lecture6.tex) [20] (lectures/lecture7.tex) [21]
|
||||
(lectures/lecture8.tex) [22] (lectures/lecture9.tex) [23]
|
||||
(lectures/lecture10.tex
|
||||
|
||||
LaTeX Font Warning: Command \small invalid in math mode on input line 6.
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 4--21
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 4--21
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 22--24
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 22--24
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 25--31
|
||||
|
||||
[]
|
||||
|
||||
[24]
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 34--47
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 34--47
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 49--51
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 49--51
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 52--58
|
||||
|
||||
[]
|
||||
|
||||
)
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 60--63
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 60--63
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 60--63
|
||||
|
||||
[]
|
||||
|
||||
|
||||
Underfull \hbox (badness 10000) in paragraph at lines 60--63
|
||||
|
||||
[]
|
||||
|
||||
[25] [26] (main.bbl
|
||||
|
||||
LaTeX Warning: Empty `thebibliography' environment on input line 3.
|
||||
|
||||
) [27] (main.aux) )
|
||||
Here is how much of TeX's memory you used:
|
||||
1961 strings out of 481556
|
||||
25986 string characters out of 2923622
|
||||
267960 words of memory out of 3000000
|
||||
17225 multiletter control sequences out of 15000+200000
|
||||
543094 words of font info for 64 fonts, out of 3000000 for 9000
|
||||
2059 strings out of 481556
|
||||
29942 string characters out of 2923622
|
||||
268529 words of memory out of 3000000
|
||||
17278 multiletter control sequences out of 15000+200000
|
||||
542786 words of font info for 63 fonts, out of 3000000 for 9000
|
||||
1141 hyphenation exceptions out of 8191
|
||||
30i,10n,31p,293b,200s stack positions out of 5000i,500n,10000p,200000b,50000s
|
||||
<C:\Users\AndreDany\AppData\Local\MiKTeX\2.9\fonts/pk/ljfour/
|
||||
jknappen/ec/dpi600\tcrm1200.pk><E:/Program Files/MiKTeX 2.9/fonts/type1/public/
|
||||
amsfonts/cm/cmbx10.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts
|
||||
/cm/cmbx12.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmex
|
||||
10.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmmi12.pfb><
|
||||
E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmmi8.pfb><E:/Progra
|
||||
m Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr10.pfb><E:/Program Files/M
|
||||
iKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr12.pfb><E:/Program Files/MiKTeX 2.9
|
||||
/fonts/type1/public/amsfonts/cm/cmr17.pfb><E:/Program Files/MiKTeX 2.9/fonts/ty
|
||||
pe1/public/amsfonts/cm/cmr6.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public
|
||||
/amsfonts/cm/cmr8.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/
|
||||
cm/cmsy10.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmsy6
|
||||
.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmsy8.pfb><E:/
|
||||
Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmti12.pfb><E:/Program
|
||||
Files/MiKTeX 2.9/fonts/type1/public/amsfonts/symbols/msbm10.pfb>
|
||||
Output written on main.pdf (20 pages, 197911 bytes).
|
||||
30i,11n,31p,321b,206s stack positions out of 5000i,500n,10000p,200000b,50000s
|
||||
<C:\Users\AndreDany\AppData\Local\MiKTeX\2.9\fonts/pk/ljfou
|
||||
r/jknappen/ec/dpi600\tcrm1200.pk><E:/Program Files/MiKTeX 2.9/fonts/type1/publi
|
||||
c/amsfonts/cm/cmbx12.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfon
|
||||
ts/cm/cmex10.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cm
|
||||
mi12.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmmi8.pfb>
|
||||
<E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr12.pfb><E:/Progr
|
||||
am Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr17.pfb><E:/Program Files/
|
||||
MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmr6.pfb><E:/Program Files/MiKTeX 2.9
|
||||
/fonts/type1/public/amsfonts/cm/cmr8.pfb><E:/Program Files/MiKTeX 2.9/fonts/typ
|
||||
e1/public/amsfonts/cm/cmsy10.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/publi
|
||||
c/amsfonts/cm/cmsy6.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfont
|
||||
s/cm/cmsy8.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmti
|
||||
12.pfb><E:/Program Files/MiKTeX 2.9/fonts/type1/public/amsfonts/symbols/msbm10.
|
||||
pfb>
|
||||
Output written on main.pdf (27 pages, 198551 bytes).
|
||||
PDF statistics:
|
||||
132 PDF objects out of 1000 (max. 8388607)
|
||||
146 PDF objects out of 1000 (max. 8388607)
|
||||
0 named destinations out of 1000 (max. 500000)
|
||||
1 words of extra memory for PDF output out of 10000 (max. 10000000)
|
||||
|
||||
|
Binary file not shown.
Binary file not shown.
@ -11,480 +11,57 @@
|
||||
\usepackage{amsmath}
|
||||
\usepackage{systeme}
|
||||
\usepackage{amssymb}
|
||||
\usepackage{subfiles}
|
||||
|
||||
\newcommand\barra[1]{\mathbb{#1}}
|
||||
\newcommand\hnn{h_{NN}}
|
||||
|
||||
\begin{document}
|
||||
\maketitle
|
||||
|
||||
\newpage
|
||||
\begin{abstract}
|
||||
This is the paper's abstract \ldots
|
||||
\end{abstract}
|
||||
|
||||
\section{Lecture 1 - 09-03-2020}
|
||||
|
||||
\subsection{Introduction}
|
||||
This is time for all good men to come to the aid of their party!
|
||||
|
||||
MACHINE LEARNING
|
||||
In this course we look at the principle behind design of Machine learning.
|
||||
Not just coding but have an idea of algorithm that can work with the data.
|
||||
We have to fix a mathematical framework: some statistic and mathematics.
|
||||
Work on ML on a higher level
|
||||
ML is data inference: make prediction about the future using data about the
|
||||
past
|
||||
Clustering —> grouping according to similarity
|
||||
Planning —> (robot to learn to interact in a certain environment)
|
||||
Classification —> (assign meaning to data) example: Spam filtering
|
||||
I want to predict the outcome of this individual or i want to predict whether a
|
||||
person click or not in a certain advertisement.
|
||||
Examples
|
||||
Classify data into categories:
|
||||
Medical diagnosis: data are medical records and • categories are diseases
|
||||
• Document analysis: data are texts and categories are topics
|
||||
• Image analysts: data are digital images and for categories name of objects
|
||||
in the image (but could be different).
|
||||
• Spam filtering: data are emails, categories are spam vs non spam.
|
||||
• Advertising prediction: data are features of web site visitors and categories
|
||||
could be click/non click on banners.
|
||||
Classification : Different from clustering since we do not have semantically
|
||||
classification (spam or not spam) —> like meaning of the image.
|
||||
I have a semantic label.
|
||||
Clustering: i want to group data with similarity function.
|
||||
Planning: Learning what to do next
|
||||
Clustering: Learn similarity function
|
||||
Classification: Learn semantic labels meaning of data
|
||||
Planning: Learn actions given state
|
||||
In classification is an easier than planning task since I’m able to make
|
||||
prediction telling what is the semantic label that goes with data points.
|
||||
If i can do classification i can clustering.
|
||||
If you do planning you probably classify (since you understanding meaning in
|
||||
your position) and then you can also do clustering probably.
|
||||
We will focus on classification because many tasks are about classification.
|
||||
Classify data in categories we can image a set of categories.
|
||||
For instance the tasks:
|
||||
‘predict income of a person’
|
||||
‘Predict tomorrow price for a stock’
|
||||
The label is a number and not an abstract thing.
|
||||
We can distinguish two cases:
|
||||
The label set —> set of possible categories for each data • point. For each of
|
||||
this could be finite set of abstract symbols (case of document classification,
|
||||
medical diagnosis). So the task is classification.
|
||||
• Real number (no bound on how many of them). My prediction will be a real
|
||||
number and is not a category. In this case we talk about a task of
|
||||
regression.
|
||||
Classification: task we want to give a label predefined point in abstract
|
||||
categories (like YES or NO)
|
||||
Regression: task we want to give label to data points but this label are
|
||||
numbers.
|
||||
When we say prediction task: used both for classification and regression
|
||||
tasks.
|
||||
Supervised learning: Label attached to data (classification, regression)
|
||||
Unsupervised learning: No labels attached to data (clustering)
|
||||
In unsupervised the mathematical modelling and way algorithm are score and
|
||||
can learn from mistakes is a little bit harder. Problem of clustering is harder to
|
||||
model mathematically.
|
||||
You can cast planning as supervised learning: i can show the robot which is
|
||||
the right action to do in that state. But that depends on planning task is
|
||||
formalised.
|
||||
Planning is higher level of learning since include task of supervised and
|
||||
unsupervised learning.
|
||||
Why is this important ?
|
||||
Algorithm has to know how to given the label.
|
||||
In ML we want to teach the algorithm to perform prediction correctly. Initially
|
||||
algorithm will make mistakes in classifying data. We want to tell algorithm that
|
||||
classification was wrong and just want to perform a score. Like giving a grade
|
||||
to the algorithm to understand if it did bad or really bad.
|
||||
So we have mistakes!
|
||||
Algorithm predicts and something makes a mistake —> we can correct it.
|
||||
Then algorithm can be more precisely.
|
||||
We have to define this mistake.
|
||||
Mistakes in case of classification:
|
||||
If category is the wrong one (in the simple case). We • have a binary signal
|
||||
where we know that category is wrong.
|
||||
How to communicate it?
|
||||
We can use the loss function: we can tell the algorithm whether is wrong or
|
||||
not.
|
||||
Loss function: measure discrepancy between ‘true’ label and predicted
|
||||
label.
|
||||
So we may assume that every datapoint has a true label.
|
||||
If we have a set of topic this is the true topic that document is talking about.
|
||||
It is typical in supervised learning.
|
||||
\\\\
|
||||
How good the algorithm did?
|
||||
\\
|
||||
|
||||
\[\ell(y,\hat{y})\leq0 \]
|
||||
|
||||
were $y $ is true label and $\hat{y}$ is predicted label
|
||||
\\\\
|
||||
We want to build a spam filter were $0$ is not spam and $1$ is spam and that
|
||||
Classification task:
|
||||
\\\\
|
||||
$
|
||||
\ell(y,\hat{y} = \begin{cases} 0, & \mbox{if } \hat{y} = y
|
||||
\\ 1, &
|
||||
\mbox{if }\hat{y} \neq y
|
||||
\end{cases}
|
||||
$
|
||||
\\\\
|
||||
The loss function is the “interface” between algorithm and data.
|
||||
So algorithm know about the data through the loss function.
|
||||
If we give a useless loss function the algorithm will not perform good: is
|
||||
important to have a good loss function.
|
||||
Spam filtering
|
||||
We have two main mistakes:
|
||||
It is the same mistake? No if i have important email and you classify as spam
|
||||
that’s bad and if you show me a spam than it’s ok.
|
||||
So we have to assign a different weight.
|
||||
Even in binary classification, mistakes are not equal.
|
||||
e Iotf.TFprIuos.uos
|
||||
True came
|
||||
razee
|
||||
Cussler aircN TASK spam ACG FIRM
|
||||
ftp.y GO
|
||||
IF F Y n is soon
|
||||
IF FEY 0 Nor spam
|
||||
ZERO CNE Cass
|
||||
n n
|
||||
Span No Seamy Binary Classification
|
||||
I 2
|
||||
FALSE PEENE Mistake Y NON SPAM J Spam
|
||||
FN Mistake i f SPAM y NO spam
|
||||
2 IF Fp Meter Airenita
|
||||
f Y F on positive
|
||||
y ye en MISTAKE
|
||||
0 otherwise
|
||||
|
||||
|
||||
\paragraph{Outline}
|
||||
The remainder of this article is organized as follows.
|
||||
Section~\ref{previous work} gives account of previous work.
|
||||
Our new and exciting results are described in Section~\ref{results}.
|
||||
Finally, Section~\ref{conclusions} gives the conclusions.
|
||||
|
||||
\newpage
|
||||
\section{Lecture 2 - 07-04-2020}
|
||||
|
||||
\subsection{Argomento}
|
||||
Classification tasks\\
|
||||
Semantic label space Y\\
|
||||
Categorization Y finite and\\ small
|
||||
Regression Y appartiene ad |R\\
|
||||
How to predict labels?\\
|
||||
Using the lost function $\rightarrow$ ..\\
|
||||
Binary classification\\
|
||||
Label space is Y = { -1, +1 }\\
|
||||
Zero-one loss\\
|
||||
\tableofcontents
|
||||
\newpage
|
||||
|
||||
$
|
||||
\ell(y,\hat{y} = \begin{cases} 0, & \mbox{if } \hat{y} = y
|
||||
\\ 1, &
|
||||
\mbox{if }\hat{y} \neq y
|
||||
\end{cases}
|
||||
\\\\
|
||||
FP \quad \hat{y} = 1,\quad y = -1\\
|
||||
FN \quad \hat{y} = -1, \quad y = 1
|
||||
$
|
||||
\\\\
|
||||
Losses for regression?\\
|
||||
$y$, and $\hat{y} \in \barra{R}$, \\so they are numbers!\\
|
||||
One example of loss is the absolute loss: absolute difference between numbers\\
|
||||
\subsection{Loss}
|
||||
\subsubsection{Absolute Loss}
|
||||
$$\ell(y,\hat{y} = | y - \hat{y} | \Rightarrow absolute \quad loss\\ $$
|
||||
--- DISEGNO ---\\\\
|
||||
Some inconvenient properties:
|
||||
|
||||
\begin{itemize}
|
||||
\item ...
|
||||
\item Derivative only two values (not much informations)
|
||||
\end{itemize}
|
||||
|
||||
\subsubsection{Square Loss}
|
||||
$$ \ell(y,\hat{y} = ( y - \hat{y} )^2 \Rightarrow \textit{square loss}\\$$
|
||||
-- DISEGNO ---\\
|
||||
Derivative :
|
||||
\begin{itemize}
|
||||
\item more informative
|
||||
\item and differentible
|
||||
\end{itemize}
|
||||
Real numbers as label $\rightarrow$ regression.\\
|
||||
Whenever taking difference between two prediction make sense (value are numbers) then we are talking about regression problem.\\
|
||||
Classification as categorization when we have small finite set.\\\\
|
||||
|
||||
\subsubsection{Example of information of square loss}
|
||||
|
||||
$\ell(y,\hat{y}) = ( y - \hat{y} )^2 = F(y)
|
||||
\\
|
||||
F'(\hat{y}) = -2 \cdot (y-\hat{y})
|
||||
$
|
||||
\begin{itemize}
|
||||
\item I'm under sho or over and how much
|
||||
\item How much far away from the truth
|
||||
\end{itemize}
|
||||
$ \ell(y,\hat{y}) = | y- \hat{y}| = F(y') \cdot F'(y) = Sign (y-\hat{y} )\\\\ $
|
||||
Question about the future\\
|
||||
Will it rain tomorrow?\\
|
||||
We have a label and this is a binary classification problem.\\
|
||||
My label space will be Y = { “rain”, “no rain” }\\
|
||||
We don’t get a binary prediction, we need another space called prediction space (or decision space). \\
|
||||
$
|
||||
Z = [0,1] \\
|
||||
\hat{y} \in Z \qquad \hat{y} \textit{ is my prediction of rain tomorrow}
|
||||
\\
|
||||
\hat{y} = \barra{P} (y = "rain") \quad \rightarrow \textit{my guess is tomorrow will rain (not sure)}\\\\
|
||||
y \in Y \qquad \hat{y} \in Z \\quad \textit{How can we manage loss?}
|
||||
\\
|
||||
\textit{Put numbers in our space}\\
|
||||
\{1,0\} \quad \textit{where 1 is rain and 0 no rain}\\\\
|
||||
$
|
||||
I measure how much I’m far from reality.\\
|
||||
So loss behave like this and the punishment is gonna go linearly??\\
|
||||
\[26..\]\\
|
||||
However is pretty annoying. Sometime I prefer to punish more so i going quadratically instead of linearly.\\
|
||||
There are other way to punish this.\\
|
||||
I called \textbf{logarithmic loss}\\
|
||||
We are extending a lot the range of our loss function.\\
|
||||
|
||||
$$
|
||||
\ell(y,\hat{y}) = | y- \hat{y}| \in |0,1| \qquad \ell(y,\hat{y}) = ( y- \hat{y})^2 \in |0,1|
|
||||
$$
|
||||
\\
|
||||
If i want to expand the punishment i use logarithmic loss\\
|
||||
\\
|
||||
$ \ell(y,\hat{y} = \begin{cases} ln \frac{1}{\hat{y}}, & \mbox{if } y = 1 \textit{(rain)}
|
||||
\\ ln \frac{1}{1-\hat{y}}, &
|
||||
\mbox{if } y = 0 \textit{(no rain}
|
||||
\end{cases}
|
||||
\\\\
|
||||
F(\hat{y}) \rightarrow \textit{can be 0 if i predict with certainty}
|
||||
\\ \textit{If}\quad \hat{y} = 0.5 \qquad \ell(y, \frac{1}{2}) = ln 2 \quad \textit{constant losses in each prediction}\\\\
|
||||
\lim_{\hat{y} \to 0^+}{\ell(1,\hat{y}) = + \infty} \\
|
||||
\textit{We give a vanishing probability not rain but tomorrow will rain.}
|
||||
\\ \textit{So this is } +\infty \\
|
||||
\lim_{\hat{y}\to 1^-} \ell(0,\hat{y}) = + \infty
|
||||
\\\\
|
||||
$
|
||||
The algorithm will be punish high more the prediction is not real. Algorithm will not get 0 and 1 because for example is impossible to get a perfect prediction.\\
|
||||
This loss is useful to give this information to the algorithm.\\\\
|
||||
Now we talk about labels and losses\\
|
||||
\subsubsection{labels and losses}
|
||||
Data points: they have some semantic labels that denote some true about this data points and we want to predict this labels.\\
|
||||
We need to define what data points are: number? Strings? File? Typically they are stored in database records \\
|
||||
They can have very precise structure or more homogeneously structured \\
|
||||
A data point can be viewed as a vector in some d dimensional real space. So it’s a vector of number
|
||||
\\
|
||||
$$
|
||||
\barra{R}^d\\\\
|
||||
X = (x_1,x_2 ..., x_d) \in \barra{R}^c
|
||||
$$
|
||||
\\
|
||||
Image can be viewed as a vector of pixel values (grey scale 0-255).\\
|
||||
I can use geometry to learn because point are in my Euclidean space. Data can be represented as point in Euclidean space. Images are list of pixel that are pretty much the same range and structure (from 0 to 255). It’s very natural to put them in a space.\\\\
|
||||
Assume X can be a record with heterogeneous fields:\\
|
||||
For example medical records, we have several values and each fields has his meaning by it’s own. (Sex, weight, height, age, zip code)\\
|
||||
Each one has a different range, in some cases is numerical but something have like age ..\\
|
||||
Does have any sense to see a medical record as a point since coordinates
|
||||
have different meaning.\\
|
||||
\textbf{Fields are not comparable.}\\
|
||||
This is something that you do: when you want to solve some inference you have to decide which are the label and what is the label space and we have to encode the data points.\\\\
|
||||
Data algorithm expect some homogenous interface.
|
||||
In this case algorithm has to build records with different values of fields.\\
|
||||
This is something that we have to pay attention too.\\
|
||||
You can always each range of values in number. So ages is number, sex you
|
||||
can give 0 and 1, weight number and zip code is number.\\
|
||||
How ever geometry doesn’t make sense since I cannot compare this
|
||||
coordinates.\\
|
||||
Linear space i can sum up as vector: i can make linear combination of
|
||||
vectors.\\
|
||||
Inner product to measure angles! (We will see in linear classifier).\\\\
|
||||
I can scramble the number of my zip code.\\
|
||||
So we get problems with sex and zip code\\\\
|
||||
Why do we care about geometry? I can use geometry to learn.\\
|
||||
However there is more to that, geometry will carry some semantically
|
||||
information that I’m going to preserve during prediction.\\
|
||||
I want to encode my images as vectors in a space. Images with dog.....\\\\
|
||||
PCA doesn’t work because assume we encode in linear space.\\
|
||||
We hope geometry will help us to predict label correctly and sometimes i hard
|
||||
to convert data into geometry point.\\
|
||||
Example of comparable data: images, or documents. \\
|
||||
Assume we have documents with corpus (set of documents).\\
|
||||
Maybe in English and talk about different thing and different words.\\
|
||||
X is a document and i want to encode X into a point fix in bidimensional
|
||||
space.\\
|
||||
There is a way to encode a set of documents in point in a fixed dimensional
|
||||
space in such way it make sense this coordinate are comparable.\\
|
||||
I can represent fields with [0,1] for Neural network for example. But they have no geometrical meaning\\
|
||||
|
||||
\subsubsection{Example TF(idf) documents encoding}
|
||||
TF encoding of docs.
|
||||
\begin{enumerate}
|
||||
\item Extract where all the words from docs
|
||||
\item Normalize words (nouns, adjectives, verbs ...)
|
||||
\item Build a dictionary of normalized words
|
||||
\end{enumerate}
|
||||
Doc $x = (x_1, .., x_d) $\\
|
||||
I associate a coordinate for each word in a dictionary.\\
|
||||
d = number of words in dictionary\\
|
||||
I can decide that \\
|
||||
$x_i = 1 \qquad \textit{If i-th word of dictionary occurs in doc.}\\
|
||||
x_i = 0 \qquad \textit{Else}
|
||||
$\\
|
||||
|
||||
$X_i\quad \textit{number of time i-th word occur in doc.}\\ $
|
||||
Longer documents will have higher value of coordinates that are not zero.\\
|
||||
Now i can do the TF encoding in which xi = frequency with which i-th word
|
||||
occur in dictionary.\\
|
||||
You cannot sum dog and cat but we are considering them frequencies so we
|
||||
are summing frequency of words.\\
|
||||
This encoding works well in real words.\\
|
||||
I can choose different way of encoding my data and sometime i can encode a
|
||||
real vector\\\\
|
||||
I want
|
||||
\begin{enumerate}
|
||||
\item A predictor $f: X \longrightarrow Y$ (in weather $X \longrightarrow Z $
|
||||
\item X is our data space (where points live)
|
||||
\item $X = \barra{R}^d$ images
|
||||
\item $ X = X_1 x ... x X_d$ Medical record
|
||||
\item $\hat{y} = f(x) $ predictor for X
|
||||
\end{enumerate}
|
||||
$(x,y)$\\\\
|
||||
We want to predict a label that is much closer to our label. How?\\
|
||||
Loss function: so this is my setting and is called and example.\\
|
||||
Data point together with label is a “example”\\
|
||||
We can get collection of example making measurements or asking people. So
|
||||
we can always recover the true label.\\
|
||||
We want to replace this process with a predictor (so we don’t have to bored a
|
||||
person).\\
|
||||
y is the ground truth for x $\rightarrow$ mean reality!\\
|
||||
If i want to predict stock for tomorrow, i will wait tomorrow to see the ground truth.
|
||||
\subfile{lectures/lecture1}
|
||||
|
||||
|
||||
\newpage
|
||||
\section{Lecture 3 - 07-04-2020}
|
||||
\subfile{lectures/lecture2}
|
||||
|
||||
\newpage
|
||||
\subfile{lectures/lecture3}
|
||||
|
||||
\newpage
|
||||
\subfile{lectures/lecture4}
|
||||
|
||||
|
||||
\newpage
|
||||
\section{Lecture 4 - 07-04-2020}
|
||||
\subfile{lectures/lecture5}
|
||||
|
||||
|
||||
\newpage
|
||||
\section{Lecture 5 - 07-04-2020}
|
||||
\subfile{lectures/lecture6}
|
||||
|
||||
|
||||
\newpage
|
||||
\section{Lecture 6 - 07-04-2020}
|
||||
\subfile{lectures/lecture7}
|
||||
|
||||
|
||||
\newpage
|
||||
\section{Lecture 7 - 07-04-2020}
|
||||
\subfile{lectures/lecture8}
|
||||
|
||||
|
||||
\newpage
|
||||
\section{Lecture 8 - 07-04-2020}
|
||||
\subfile{lectures/lecture9}
|
||||
|
||||
|
||||
\newpage
|
||||
\section{Lecture 9 - 07-04-2020}
|
||||
|
||||
\subfile{lectures/lecture10}
|
||||
|
||||
\newpage
|
||||
\section{Lecture 10 - 07-04-2020}
|
||||
\subsection{TO BE DEFINE}
|
||||
|
||||
$|E[z] = |E[|E[z|x]]$
|
||||
\\\\
|
||||
$|E[X] = \sum_{t = 1}^{m} |E[x \Pi(A\begin{small}
|
||||
t \end{small} ) ]$
|
||||
\\\\
|
||||
$x \in \mathbb{R}^d
|
||||
$
|
||||
\\
|
||||
$\mathbb{P}(Y_{\Pi(s,x)} = 1) = \\\\ \mathbb{E}[\Pi { Y_{\Pi(s,x)} = 1 } ] = \\\\
|
||||
= \sum_{t = 1}^{m} \mathbb{E}[\Pi\{Y_t = 1\} \cdot \Pi { Pi(s,x) = t}] = \\\\
|
||||
= \sum_{t = 1}^{m} \mathbb{E}[\mathbb{E}[\Pi\{Y_t = 1\} \cdot \Pi\{\Pi(s,x) = t\} | X_t]] = \\\\
|
||||
given the fact that Y_t \sim \eta(X_t) \Rightarrow give me probability \\
|
||||
Y_t = 1 and \Pi(s,x) = t are independent given X_Y (e. g. \mathbb{E}[Zx] = \mathbb{E}[x] \ast \cdot \mathbb{E}[z]\\\\
|
||||
= \sum_{t = 1}^{m} \barra{E}[\barra{E}[\Pi\{Y_t = 1\}|X_t] \cdot \barra{E} [ \Pi(s,x) = t | Xt]] = \\\\
|
||||
= \sum_{t = 1}^{m} \barra{E}[\eta(X_t) \cdot \Pi \cdot \{\Pi (s,x) = t \}] = \\\\
|
||||
= \barra{E} [ \eta(X_{\Pi(s,x)}]
|
||||
$
|
||||
|
||||
\[ \barra{P} (Y_{\Pi(s,x)}| X=x = \barra{E}[\eta(X_\Pi (s,x))] \]
|
||||
\\\\
|
||||
|
||||
$
|
||||
\barra{P} (Y_{\Pi(s,x)} = 1, y = -1 ) = \\\\
|
||||
= \barra{E}[\Pi\{Y_{\Pi(s,x) }= 1\} \dot \Pi\{Y= -1|X\} ]] = \\\\
|
||||
= \barra{E}[\Pi \{ Y_{\Pi(s,x)} = 1\} \cdot \Pi \{ y = -1 \} ] = \\\\
|
||||
= \barra{E}[\barra{E}[\Pi \{ Y_{\Pi(s,x)} = 1\} \cdot \Pi \{ y = -1 | X \} ]] = \\\\
|
||||
$
|
||||
|
||||
\[ Y_{\Pi(s,x)} = 1 \quad \quad y = -1 (1- \eta(x)) \quad when \quad X = x\]
|
||||
|
||||
$
|
||||
\\\\ = \barra{E}[\barra{E}[\Pi \{Y_\Pi(s,x)\} = 1 | X] \cdot \barra{E}[\Pi \{y = -1\} |X ]] = \\\\
|
||||
= \barra {E}[\eta_{\Pi(s,x)} \cdot (1-\eta(x))] = \\\\
|
||||
similarly: \quad \barra{P}(Y_{\Pi(s,x)} = -1 , y = 1) = \\
|
||||
\barra{E} [(1- \eta_{\Pi(s,x)}) \cdot \eta(x)]
|
||||
\\\\
|
||||
\barra{E} [ \ell_D (\hat{h}_s)] = \barra{P}(Y_{\Pi(s,x)} \neq y ) =
|
||||
\\\\
|
||||
= \barra{P}(Y_{\Pi(s,x)} = 1, y = -1) + \barra{P}(Y_{Pi(s,x)} = -1, y = 1) =
|
||||
\\\\
|
||||
= \barra{E} [\eta_{\Pi(s,x)} \cdot (1-eta(x))] + \barra{E}[( 1- \eta_{\Pi(s,x)})\cdot \eta(x)]$
|
||||
\\\\
|
||||
Make assumptions on $D_x \quad and \quad \eta$: \\
|
||||
|
||||
|
||||
MANCAAAAAAA ROBAAA
|
||||
\\\\
|
||||
|
||||
$
|
||||
\eta(x') <= \eta(x) + c || X-x'|| --> euclidean distance
|
||||
\\\\
|
||||
1-\eta(x') <= 1- \eta(x) + c||X-x'||
|
||||
\\\\
|
||||
$
|
||||
|
||||
|
||||
$
|
||||
X' = X_{Pi(s,x)}
|
||||
\\\\
|
||||
\eta(X) \cdot (1-\eta(x')) + (1-\eta(x))\cdot \eta(x') <=
|
||||
\\\\
|
||||
<= \eta(x) \cdot((1-\eta(x))+\eta(x)\cdot c||X-x'|| + (1-\eta(x))\cdot c||X-x'|| =
|
||||
\\\\
|
||||
= 2 \cdot \eta(x) \cdot (1- \eta(x)) + c||X-x'|| \\\\
|
||||
\barra{E}[\ell_d \cdot (\hat{h}_s)] <= 2 \cdot \barra{E} [\eta(x) - (1-\eta(x))] + c \cdot \barra(E)[||X-x_{\Pi(s,x)}||]
|
||||
$
|
||||
\\ where $<=$ mean at most
|
||||
\\\\
|
||||
Compare risk for zero-one loss
|
||||
\\
|
||||
$
|
||||
\barra{E}[min\{\eta(x),1-\eta(x)\}] = \ell_D (f*)
|
||||
\\\\
|
||||
\eta(x) \cdot( 1- \eta(X)) <= min\{\eta(x), 1-eta(x) \} \quad \forall x
|
||||
\\\\
|
||||
\barra{E}[\eta(x)\cdot(1-\eta(x)] <= \ell_D(f*)
|
||||
\\\\
|
||||
\barra{E}[\ell_d(\hat{l}_s)] <= 2 \cdot \ell_D(f*) + c \cdot \barra{E}[||X-X_{\Pi(s,x)}||]
|
||||
\\\\
|
||||
\eta(x) \in \{0,1\}
|
||||
$
|
||||
\\\\
|
||||
Depends on dimension: curse of dimensionality
|
||||
\\\\--DISEGNO--
|
||||
\\\\
|
||||
$
|
||||
\ell_d(f*) = 0 \iff min\{ \eta(x), 1-\eta(x)\} =0 \quad$ with probability = 1
|
||||
\\
|
||||
to be true $\eta(x) \in \{0,1\}$
|
||||
|
||||
|
||||
|
||||
\bibliographystyle{abbrv}
|
||||
\bibliography{main}
|
||||
|
||||
|
@ -0,0 +1,23 @@
|
||||
\contentsline {section}{\numberline {1}Lecture 1 - 09-03-2020}{3}%
|
||||
\contentsline {subsection}{\numberline {1.1}Introduction}{3}%
|
||||
\contentsline {section}{\numberline {2}Lecture 2 - 07-04-2020}{6}%
|
||||
\contentsline {subsection}{\numberline {2.1}Argomento}{6}%
|
||||
\contentsline {subsection}{\numberline {2.2}Loss}{6}%
|
||||
\contentsline {subsubsection}{\numberline {2.2.1}Absolute Loss}{6}%
|
||||
\contentsline {subsubsection}{\numberline {2.2.2}Square Loss}{7}%
|
||||
\contentsline {subsubsection}{\numberline {2.2.3}Example of information of square loss}{7}%
|
||||
\contentsline {subsubsection}{\numberline {2.2.4}labels and losses}{9}%
|
||||
\contentsline {subsubsection}{\numberline {2.2.5}Example TF(idf) documents encoding}{10}%
|
||||
\contentsline {section}{\numberline {3}Lecture 3 - 07-04-2020}{12}%
|
||||
\contentsline {subsection}{\numberline {3.1}Overfitting}{14}%
|
||||
\contentsline {subsubsection}{\numberline {3.1.1}Noise in the data}{14}%
|
||||
\contentsline {subsection}{\numberline {3.2}Underfitting}{16}%
|
||||
\contentsline {subsection}{\numberline {3.3}Nearest neighbour}{16}%
|
||||
\contentsline {section}{\numberline {4}Lecture 4 - 07-04-2020}{18}%
|
||||
\contentsline {section}{\numberline {5}Lecture 5 - 07-04-2020}{19}%
|
||||
\contentsline {section}{\numberline {6}Lecture 6 - 07-04-2020}{20}%
|
||||
\contentsline {section}{\numberline {7}Lecture 7 - 07-04-2020}{21}%
|
||||
\contentsline {section}{\numberline {8}Lecture 8 - 07-04-2020}{22}%
|
||||
\contentsline {section}{\numberline {9}Lecture 9 - 07-04-2020}{23}%
|
||||
\contentsline {section}{\numberline {10}Lecture 10 - 07-04-2020}{24}%
|
||||
\contentsline {subsection}{\numberline {10.1}TO BE DEFINE}{24}%
|
Loading…
Reference in New Issue
Block a user