2020-04-11 23:48:53 +02:00
|
|
|
|
\title{Statistical Methods for Machine Learning}
|
|
|
|
|
\author{
|
|
|
|
|
Andrea Ierardi \\
|
|
|
|
|
Data Science and Economcis\\
|
|
|
|
|
Università degli Studi di Milano\\
|
|
|
|
|
|
|
|
|
|
}
|
|
|
|
|
\date{\today}
|
|
|
|
|
|
|
|
|
|
\documentclass[12pt]{article}
|
|
|
|
|
\usepackage{amsmath}
|
|
|
|
|
\usepackage{systeme}
|
|
|
|
|
\usepackage{amssymb}
|
|
|
|
|
\newcommand\barra[1]{\mathbb{#1}}
|
|
|
|
|
|
|
|
|
|
\begin{document}
|
|
|
|
|
\maketitle
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\begin{abstract}
|
|
|
|
|
This is the paper's abstract \ldots
|
|
|
|
|
\end{abstract}
|
|
|
|
|
|
|
|
|
|
\section{Lecture 1 - 09-03-2020}
|
|
|
|
|
|
|
|
|
|
\subsection{Introduction}
|
|
|
|
|
This is time for all good men to come to the aid of their party!
|
|
|
|
|
|
|
|
|
|
MACHINE LEARNING
|
|
|
|
|
In this course we look at the principle behind design of Machine learning.
|
|
|
|
|
Not just coding but have an idea of algorithm that can work with the data.
|
|
|
|
|
We have to fix a mathematical framework: some statistic and mathematics.
|
|
|
|
|
Work on ML on a higher level
|
|
|
|
|
ML is data inference: make prediction about the future using data about the
|
|
|
|
|
past
|
|
|
|
|
Clustering —> grouping according to similarity
|
|
|
|
|
Planning —> (robot to learn to interact in a certain environment)
|
|
|
|
|
Classification —> (assign meaning to data) example: Spam filtering
|
|
|
|
|
I want to predict the outcome of this individual or i want to predict whether a
|
|
|
|
|
person click or not in a certain advertisement.
|
|
|
|
|
Examples
|
|
|
|
|
Classify data into categories:
|
|
|
|
|
Medical diagnosis: data are medical records and • categories are diseases
|
|
|
|
|
• Document analysis: data are texts and categories are topics
|
|
|
|
|
• Image analysts: data are digital images and for categories name of objects
|
|
|
|
|
in the image (but could be different).
|
|
|
|
|
• Spam filtering: data are emails, categories are spam vs non spam.
|
|
|
|
|
• Advertising prediction: data are features of web site visitors and categories
|
|
|
|
|
could be click/non click on banners.
|
|
|
|
|
Classification : Different from clustering since we do not have semantically
|
|
|
|
|
classification (spam or not spam) —> like meaning of the image.
|
|
|
|
|
I have a semantic label.
|
|
|
|
|
Clustering: i want to group data with similarity function.
|
|
|
|
|
Planning: Learning what to do next
|
|
|
|
|
Clustering: Learn similarity function
|
|
|
|
|
Classification: Learn semantic labels meaning of data
|
|
|
|
|
Planning: Learn actions given state
|
|
|
|
|
In classification is an easier than planning task since I’m able to make
|
|
|
|
|
prediction telling what is the semantic label that goes with data points.
|
|
|
|
|
If i can do classification i can clustering.
|
|
|
|
|
If you do planning you probably classify (since you understanding meaning in
|
|
|
|
|
your position) and then you can also do clustering probably.
|
|
|
|
|
We will focus on classification because many tasks are about classification.
|
|
|
|
|
Classify data in categories we can image a set of categories.
|
|
|
|
|
For instance the tasks:
|
|
|
|
|
‘predict income of a person’
|
|
|
|
|
‘Predict tomorrow price for a stock’
|
|
|
|
|
The label is a number and not an abstract thing.
|
|
|
|
|
We can distinguish two cases:
|
|
|
|
|
The label set —> set of possible categories for each data • point. For each of
|
|
|
|
|
this could be finite set of abstract symbols (case of document classification,
|
|
|
|
|
medical diagnosis). So the task is classification.
|
|
|
|
|
• Real number (no bound on how many of them). My prediction will be a real
|
|
|
|
|
number and is not a category. In this case we talk about a task of
|
|
|
|
|
regression.
|
|
|
|
|
Classification: task we want to give a label predefined point in abstract
|
|
|
|
|
categories (like YES or NO)
|
|
|
|
|
Regression: task we want to give label to data points but this label are
|
|
|
|
|
numbers.
|
|
|
|
|
When we say prediction task: used both for classification and regression
|
|
|
|
|
tasks.
|
|
|
|
|
Supervised learning: Label attached to data (classification, regression)
|
|
|
|
|
Unsupervised learning: No labels attached to data (clustering)
|
|
|
|
|
In unsupervised the mathematical modelling and way algorithm are score and
|
|
|
|
|
can learn from mistakes is a little bit harder. Problem of clustering is harder to
|
|
|
|
|
model mathematically.
|
|
|
|
|
You can cast planning as supervised learning: i can show the robot which is
|
|
|
|
|
the right action to do in that state. But that depends on planning task is
|
|
|
|
|
formalised.
|
|
|
|
|
Planning is higher level of learning since include task of supervised and
|
|
|
|
|
unsupervised learning.
|
|
|
|
|
Why is this important ?
|
|
|
|
|
Algorithm has to know how to given the label.
|
|
|
|
|
In ML we want to teach the algorithm to perform prediction correctly. Initially
|
|
|
|
|
algorithm will make mistakes in classifying data. We want to tell algorithm that
|
|
|
|
|
classification was wrong and just want to perform a score. Like giving a grade
|
|
|
|
|
to the algorithm to understand if it did bad or really bad.
|
|
|
|
|
So we have mistakes!
|
|
|
|
|
Algorithm predicts and something makes a mistake —> we can correct it.
|
|
|
|
|
Then algorithm can be more precisely.
|
|
|
|
|
We have to define this mistake.
|
|
|
|
|
Mistakes in case of classification:
|
|
|
|
|
If category is the wrong one (in the simple case). We • have a binary signal
|
|
|
|
|
where we know that category is wrong.
|
|
|
|
|
How to communicate it?
|
|
|
|
|
We can use the loss function: we can tell the algorithm whether is wrong or
|
|
|
|
|
not.
|
|
|
|
|
Loss function: measure discrepancy between ‘true’ label and predicted
|
|
|
|
|
label.
|
|
|
|
|
So we may assume that every datapoint has a true label.
|
|
|
|
|
If we have a set of topic this is the true topic that document is talking about.
|
|
|
|
|
It is typical in supervised learning.
|
|
|
|
|
\\\\
|
|
|
|
|
How good the algorithm did?
|
|
|
|
|
\\
|
|
|
|
|
|
2020-04-12 11:38:30 +02:00
|
|
|
|
\[\ell(y,\hat{y})\leq0 \]
|
2020-04-11 23:48:53 +02:00
|
|
|
|
|
|
|
|
|
were $y $ is true label and $\hat{y}$ is predicted label
|
|
|
|
|
\\\\
|
|
|
|
|
We want to build a spam filter were $0$ is not spam and $1$ is spam and that
|
|
|
|
|
Classification task:
|
|
|
|
|
\\\\
|
2020-04-12 11:38:30 +02:00
|
|
|
|
$
|
|
|
|
|
\ell(y,\hat{y} = \begin{cases} 0, & \mbox{if } \hat{y} = y
|
2020-04-11 23:48:53 +02:00
|
|
|
|
\\ 1, &
|
|
|
|
|
\mbox{if }\hat{y} \neq y
|
|
|
|
|
\end{cases}
|
|
|
|
|
$
|
|
|
|
|
\\\\
|
|
|
|
|
The loss function is the “interface” between algorithm and data.
|
|
|
|
|
So algorithm know about the data through the loss function.
|
|
|
|
|
If we give a useless loss function the algorithm will not perform good: is
|
|
|
|
|
important to have a good loss function.
|
|
|
|
|
Spam filtering
|
|
|
|
|
We have two main mistakes:
|
|
|
|
|
It is the same mistake? No if i have important email and you classify as spam
|
|
|
|
|
that’s bad and if you show me a spam than it’s ok.
|
|
|
|
|
So we have to assign a different weight.
|
|
|
|
|
Even in binary classification, mistakes are not equal.
|
|
|
|
|
e Iotf.TFprIuos.uos
|
|
|
|
|
True came
|
|
|
|
|
razee
|
|
|
|
|
Cussler aircN TASK spam ACG FIRM
|
|
|
|
|
ftp.y GO
|
|
|
|
|
IF F Y n is soon
|
|
|
|
|
IF FEY 0 Nor spam
|
|
|
|
|
ZERO CNE Cass
|
|
|
|
|
n n
|
|
|
|
|
Span No Seamy Binary Classification
|
|
|
|
|
I 2
|
|
|
|
|
FALSE PEENE Mistake Y NON SPAM J Spam
|
|
|
|
|
FN Mistake i f SPAM y NO spam
|
|
|
|
|
2 IF Fp Meter Airenita
|
|
|
|
|
f Y F on positive
|
|
|
|
|
y ye en MISTAKE
|
|
|
|
|
0 otherwise
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\paragraph{Outline}
|
|
|
|
|
The remainder of this article is organized as follows.
|
|
|
|
|
Section~\ref{previous work} gives account of previous work.
|
|
|
|
|
Our new and exciting results are described in Section~\ref{results}.
|
|
|
|
|
Finally, Section~\ref{conclusions} gives the conclusions.
|
|
|
|
|
|
|
|
|
|
|
2020-04-12 11:38:30 +02:00
|
|
|
|
\section{Lecture 2 - 07-04-2020}
|
|
|
|
|
|
|
|
|
|
\subsection{Argomento}
|
|
|
|
|
Classification tasks\\
|
|
|
|
|
Semantic label space Y\\
|
|
|
|
|
Categorization Y finite and\\ small
|
|
|
|
|
Regression Y appartiene ad |R\\
|
|
|
|
|
How to predict labels?\\
|
|
|
|
|
Using the lost function —> ..\\
|
|
|
|
|
Binary classification\\
|
|
|
|
|
Label space is Y = { -1, +1 }\\
|
|
|
|
|
Zero-one loss\\
|
|
|
|
|
|
|
|
|
|
$
|
|
|
|
|
\ell(y,\hat{y} = \begin{cases} 0, & \mbox{if } \hat{y} = y
|
|
|
|
|
\\ 1, &
|
|
|
|
|
\mbox{if }\hat{y} \neq y
|
|
|
|
|
\end{cases}
|
|
|
|
|
\\\\
|
|
|
|
|
FP \quad \hat{y} = 1,\quad y = -1\\
|
|
|
|
|
FN \quad \hat{y} = -1, \quad y = 1
|
|
|
|
|
$
|
|
|
|
|
\\\\
|
|
|
|
|
Losses for regression?\\
|
|
|
|
|
$y$, and $\hat{y} \in \barra{R}$, \\so they are numbers!\\
|
|
|
|
|
One example of loss is the absolute loss: absolute difference between numbers\\
|
|
|
|
|
\subsection{Loss}
|
|
|
|
|
\subsubsection{Absolute Loss}
|
|
|
|
|
$$\ell(y,\hat{y} = | y - \hat{y} | \Rightarrow absolute \quad loss\\ $$
|
|
|
|
|
--- DISEGNO ---\\\\
|
|
|
|
|
Some inconvenient properties:
|
|
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\item ...
|
|
|
|
|
\item Derivative only two values (not much informations)
|
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
|
|
\subsubsection{Square Loss}
|
|
|
|
|
$$ \ell(y,\hat{y} = ( y - \hat{y} )^2 \Rightarrow \textit{square loss}\\$$
|
|
|
|
|
-- DISEGNO ---\\
|
|
|
|
|
Derivative :
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\item more informative
|
|
|
|
|
\item and differentible
|
|
|
|
|
\end{itemize}
|
|
|
|
|
Real numbers as label $\rightarrow$ regression.\\
|
|
|
|
|
Whenever taking difference between two prediction make sense (value are numbers) then we are talking about regression problem.\\
|
|
|
|
|
Classification as categorization when we have small finite set.\\\\
|
|
|
|
|
|
|
|
|
|
\subsubsection{Example of information of square loss}
|
|
|
|
|
|
|
|
|
|
$\ell(y,\hat{y}) = ( y - \hat{y} )^2 = F(y)
|
|
|
|
|
\\
|
|
|
|
|
F'(\hat(y)) = -2 \cdot (y-\hat{y})
|
|
|
|
|
$
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\item I'm under sho or over and how much
|
|
|
|
|
\item How much far away from the truth
|
|
|
|
|
\end{itemize}
|
|
|
|
|
$ \ell(y,\hat{y}) = | y- \hat{y}| = F(y') \cdot F'(y) = Sign (y-\hat{y} )\\\\ $
|
|
|
|
|
Question about the future\\
|
|
|
|
|
Will it rain tomorrow?\\
|
|
|
|
|
We have a label and this is a binary classification problem.\\
|
|
|
|
|
My label space will be Y = { “rain”, “no rain” }\\
|
|
|
|
|
We don’t get a binary prediction, we need another space called prediction space (or decision space). Z = [0,1]\\
|
|
|
|
|
$
|
|
|
|
|
Z = [0,1]
|
|
|
|
|
\hat{y} \in Z \qquad \hat{y} \textit{ is my prediction of rain tomorrow}
|
|
|
|
|
\\
|
|
|
|
|
\hat{y} = \barra{P} (y = "rain") \quad \rightarrow \textit{my guess is tomorrow will rain (not sure)}\\\\
|
|
|
|
|
y \in Y \qquad \hat{y} \in Z \\quad \textit{How can we manage loss?}
|
|
|
|
|
\\
|
|
|
|
|
\textit{Put numbers in our space}\\
|
|
|
|
|
\{1,0\} \quad \textit{where 1 is rain and 0 no rain}\\\\
|
|
|
|
|
$
|
|
|
|
|
I measure how much I’m far from reality.\\
|
|
|
|
|
So loss behave like this and the punishment is gonna go linearly??\\
|
|
|
|
|
\[26..\]\\
|
|
|
|
|
However is pretty annoying. Sometime I prefer to punish more so i going quadratically instead of linearly.\\
|
|
|
|
|
There are other way to punish this.\\
|
|
|
|
|
I called \textbf{logarithmic loss}\\
|
|
|
|
|
We are extending a lot the range of our loss function.\\
|
|
|
|
|
|
|
|
|
|
$$
|
|
|
|
|
\ell(y,\hat{y}) = | y- \hat{y}| \in |0,1| \qquad \ell(y,\hat{y}) = ( y- \hat{y})^2 \in |0,1|
|
|
|
|
|
$$
|
|
|
|
|
\\
|
|
|
|
|
If i want to expand the punishment i use logarithmic loss\\
|
|
|
|
|
\\
|
|
|
|
|
$ \ell(y,\hat{y} = \begin{cases} ln \dfrac{1}{\hat{y}, & \mbox{if } y = 1 \textit{(rain)}
|
|
|
|
|
\\ ln \frac{1}{1-\hat{y}}, &
|
|
|
|
|
\mbox{if } y = 0 \textit{(no rain}
|
|
|
|
|
\end{cases}
|
|
|
|
|
\\\\
|
|
|
|
|
F(\hat{y}) \rightarrow can be 0 if i predict with certainty
|
|
|
|
|
|
|
|
|
|
\mbox{if} \hat{y} = 0.5 \qquad \ell(y, \dfrac{1}{2}) = ln 2 \quad \textit{costnat losses in each prediction}\\\\
|
|
|
|
|
\lim_{\hat{y}\to\0^+} \ell(1,\hat{y}) = + \inf
|
|
|
|
|
|
|
|
|
|
$
|
|
|
|
|
|
|
|
|
|
\section{Lecture 3 - 07-04-2020}
|
|
|
|
|
\section{Lecture 4 - 07-04-2020}
|
|
|
|
|
\section{Lecture 5 - 07-04-2020}
|
|
|
|
|
\section{Lecture 6 - 07-04-2020}
|
|
|
|
|
\section{Lecture 7 - 07-04-2020}
|
|
|
|
|
\section{Lecture 8 - 07-04-2020}
|
|
|
|
|
\section{Lecture 9 - 07-04-2020}
|
|
|
|
|
|
2020-04-11 23:48:53 +02:00
|
|
|
|
\section{Lecture 10 - 07-04-2020}
|
|
|
|
|
|
|
|
|
|
\subsection{TO BE DEFINE}
|
|
|
|
|
|
|
|
|
|
$|E[z] = |E[|E[z|x]]$
|
|
|
|
|
\\\\
|
|
|
|
|
$|E[X] = \sum_{t = 1}^{m} |E[x \Pi(A\begin{small}
|
|
|
|
|
t \end{small} ) ]$
|
|
|
|
|
\\\\
|
|
|
|
|
$x \in \mathbb{R}^d
|
|
|
|
|
$
|
|
|
|
|
\\
|
|
|
|
|
$\mathbb{P}(Y_{\Pi(s,x)} = 1) = \\\\ \mathbb{E}[\Pi { Y_{\Pi(s,x)} = 1 } ] = \\\\
|
|
|
|
|
= \sum_{t = 1}^{m} \mathbb{E}[\Pi\{Y_t = 1\} \cdot \Pi { Pi(s,x) = t}] = \\\\
|
|
|
|
|
= \sum_{t = 1}^{m} \mathbb{E}[\mathbb{E}[\Pi\{Y_t = 1\} \cdot \Pi\{\Pi(s,x) = t\} | X_t]] = \\\\
|
|
|
|
|
given the fact that Y_t \sim \eta(X_t) \Rightarrow give me probability \\
|
|
|
|
|
Y_t = 1 and \Pi(s,x) = t are independent given X_Y (e. g. \mathbb{E}[Zx] = \mathbb{E}[x] \ast \cdot \mathbb{E}[z]\\\\
|
|
|
|
|
= \sum_{t = 1}^{m} \barra{E}[\barra{E}[\Pi\{Y_t = 1\}|X_t] \cdot \barra{E} [ \Pi(s,x) = t | Xt]] = \\\\
|
|
|
|
|
= \sum_{t = 1}^{m} \barra{E}[\eta(X_t) \cdot \Pi \cdot \{\Pi (s,x) = t \}] = \\\\
|
|
|
|
|
= \barra{E} [ \eta(X_{\Pi(s,x)}]
|
|
|
|
|
$
|
|
|
|
|
|
|
|
|
|
\[ \barra{P} (Y_{\Pi(s,x)}| X=x = \barra{E}[\eta(X_\Pi (s,x))] \]
|
|
|
|
|
\\\\
|
|
|
|
|
|
|
|
|
|
$
|
|
|
|
|
\barra{P} (Y_{\Pi(s,x)} = 1, y = -1 ) = \\\\
|
|
|
|
|
= \barra{E}[\Pi\{Y_{\Pi(s,x) }= 1\} \dot \Pi\{Y= -1|X\} ]] = \\\\
|
|
|
|
|
= \barra{E}[\Pi \{ Y_{\Pi(s,x)} = 1\} \cdot \Pi \{ y = -1 \} ] = \\\\
|
|
|
|
|
= \barra{E}[\barra{E}[\Pi \{ Y_{\Pi(s,x)} = 1\} \cdot \Pi \{ y = -1 | X \} ]] = \\\\
|
|
|
|
|
$
|
|
|
|
|
|
|
|
|
|
\[ Y_{\Pi(s,x)} = 1 \quad \quad y = -1 (1- \eta(x)) \quad when \quad X = x\]
|
|
|
|
|
|
|
|
|
|
$
|
|
|
|
|
\\\\ = \barra{E}[\barra{E}[\Pi \{Y_\Pi(s,x)\} = 1 | X] \cdot \barra{E}[\Pi \{y = -1\} |X ]] = \\\\
|
|
|
|
|
= \barra {E}[\eta_{\Pi(s,x)} \cdot (1-\eta(x))] = \\\\
|
|
|
|
|
similarly: \quad \barra{P}(Y_{\Pi(s,x)} = -1 , y = 1) = \\
|
|
|
|
|
\barra{E} [(1- \eta_{\Pi(s,x)}) \cdot \eta(x)]
|
|
|
|
|
\\\\
|
|
|
|
|
\barra{E} [ \ell_D (\hat{h}_s)] = \barra{P}(Y_{\Pi(s,x)} \neq y ) =
|
|
|
|
|
\\\\
|
|
|
|
|
= \barra{P}(Y_{\Pi(s,x)} = 1, y = -1) + \barra{P}(Y_{Pi(s,x)} = -1, y = 1) =
|
|
|
|
|
\\\\
|
|
|
|
|
= \barra{E} [\eta_{\Pi(s,x)} \cdot (1-eta(x))] + \barra{E}[( 1- \eta_{\Pi(s,x)})\cdot \eta(x)]$
|
|
|
|
|
\\\\
|
|
|
|
|
Make assumptions on $D_x \quad and \quad \eta$: \\
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MANCAAAAAAA ROBAAA
|
|
|
|
|
\\\\
|
|
|
|
|
|
|
|
|
|
$
|
|
|
|
|
\eta(x') <= \eta(x) + c || X-x'|| --> euclidean distance
|
|
|
|
|
\\\\
|
|
|
|
|
1-\eta(x') <= 1- \eta(x) + c||X-x'||
|
|
|
|
|
\\\\
|
|
|
|
|
$
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
$
|
|
|
|
|
X' = X_{Pi(s,x)}
|
|
|
|
|
\\\\
|
|
|
|
|
\eta(X) \cdot (1-\eta(x')) + (1-\eta(x))\cdot \eta(x') <=
|
|
|
|
|
\\\\
|
|
|
|
|
<= \eta(x) \cdot((1-\eta(x))+\eta(x)\cdot c||X-x'|| + (1-\eta(x))\cdot c||X-x'|| =
|
|
|
|
|
\\\\
|
|
|
|
|
= 2 \cdot \eta(x) \cdot (1- \eta(x)) + c||X-x'|| \\\\
|
|
|
|
|
\barra{E}[\ell_d \cdot (\hat{h}_s)] <= 2 \cdot \barra{E} [\eta(x) - (1-\eta(x))] + c \cdot \barra(E)[||X-x_{\Pi(s,x)}||]
|
|
|
|
|
$
|
|
|
|
|
\\ where $<=$ mean at most
|
|
|
|
|
\\\\
|
|
|
|
|
Compare risk for zero-one loss
|
|
|
|
|
\\
|
|
|
|
|
$
|
|
|
|
|
\barra{E}[min\{\eta(x),1-\eta(x)\}] = \ell_D (f*)
|
|
|
|
|
\\\\
|
|
|
|
|
\eta(x) \cdot( 1- \eta(X)) <= min\{\eta(x), 1-eta(x) \} \quad \forall x
|
|
|
|
|
\\\\
|
|
|
|
|
\barra{E}[\eta(x)\cdot(1-\eta(x)] <= \ell_D(f*)
|
|
|
|
|
\\\\
|
|
|
|
|
\barra{E}[\ell_d(\hat{l}_s)] <= 2 \cdot \ell_D(f*) + c \cdot \barra{E}[||X-X_{\Pi(s,x)}||]
|
|
|
|
|
\\\\
|
|
|
|
|
\eta(x) \in \{0,1\}
|
|
|
|
|
$
|
|
|
|
|
\\\\
|
|
|
|
|
Depends on dimension: curse of dimensionality
|
|
|
|
|
\\\\--DISEGNO--
|
|
|
|
|
\\\\
|
|
|
|
|
$
|
|
|
|
|
\ell_d(f*) = 0 \iff min\{ \eta(x), 1-\eta(x)\} =0 \quad$ with probability = 1
|
|
|
|
|
\\
|
|
|
|
|
to be true $\eta(x) \in \{0,1\}$
|
|
|
|
|
|
|
|
|
|
\section{Previous work}\label{previous work}
|
|
|
|
|
A much longer \LaTeXe{} example was written by Gil~\cite{Gil:02}.
|
|
|
|
|
|
|
|
|
|
\section{Results}\label{results}
|
|
|
|
|
In this section we describe the results.
|
|
|
|
|
|
|
|
|
|
\section{Conclusions}\label{conclusions}
|
|
|
|
|
We worked hard, and achieved very little.
|
|
|
|
|
|
|
|
|
|
\bibliographystyle{abbrv}
|
|
|
|
|
\bibliography{main}
|
|
|
|
|
|
|
|
|
|
\end{document}
|
|
|
|
|
This is never printed
|