where $\red{\frac{\nl'}{\nl}}=\alpha$ and $\red{\frac{\nl"}{\nl}}=1-\alpha$\qquad so \quad$\red{\frac{\nl'}{\nl}}+\red{\frac{\nl"}{\nl}}=1$\qquad$\longrightarrow$$\alpha+1-\alpha=1$
\\\\
$N_{\ell'}^++ N_{\ell"}^+=\nl
$
\\\\
I want to check function $min$ concave between 0 and 1.\\
Unless leaves are \textit{"pured"}, the training error will be bigger than 0.
\\\\
In general, i can always write $\hat{\ell}(h_t)$ to 0 by growing enough the tree unless there are $x_1$ in the Time Series such that $(x_t, y_t)(x_t,y_t’)$ with $y_t \neq y_t’$ both occur.
\\
--- DISEGNO ----
\\
$$ if (x_1=\alpha)\wedge(x_2=\geq\alpha)\vee(x_1= b)\vee(x_1= c)\wedge(x_3= y)\qquad
$$
$$
\textit{then predict 1}\qquad\qquad
$$
$
\qquad\quad\,\,else
$
$$
\textit{then predict -1}\qquad\qquad
$$
\\
--- Picture of tree classifier of iris dataset. ---\\
I’m using due attribute at the time.\\
Each data point is a flower and i can measure how petal and sepal are long.
I can use two attribute and i test this two. I can see the plot of the tree
classifier (second one) making test splitting data space into region that has
this sort of “blackish” shape ( like boxes: blue box, red box, yellow box)\\
A good exercise in which I want to reconstruct the tree given this picture.
\\\\
\subsection{Statistical model for Machine Learning}
To understand Tree classifier, nearest neighbour and other algorithm...\\
It’s important to understand that the only way to have a guideline in which
model to choose.\\\\
\textbf{This mathematical model are developed to learning and choose learning
algorithm.}\\\\
Now let start with theoretical model.
\begin{itemize}
\item How example $(x,y)$ are generated to create test set and training set?\\
We get the dataset but we need to have a mathematical model for this
process.
$(x,y)$ are drawn from a fixed but unknown probability distribution on the pairs $X$
and $Y$ ($X$ data space, $Y$ label set o label space)
\item Why $X$ should be random? \\
In general we assumed that not all the $x$ in $X$ are equally likely to be observed.
I have some distribution over my data point and this said that I’m most like to
get a datapoint to another.
\item How much label?\\
Often labels are not determined uniquely by their datapoints because labels
are given by human that have their subjective thoughts and also natural
phenomena. Labels are stochastic phenomena given a datapoint: i will have a
distribution.
\end{itemize}
We’re going to write (in capital) $(X, Y)$ since they are random variable drawn
from $D$ on $X \cdot Y$
A dataset $(X_1, Y_1) ... (X_m, Y_m)$ they are drawn independently from $D$
(distribution on examples)\\
When I get a training the abstraction of process collecting a training set\\
$D$ is a joint probability distribution over $X\cdot Y$\\
where $D_x$ is the marginal over $X \rightarrow D_y|x$ (conditional of $Y$ given $X$).\\
I can divided my draw in two part.
I draw sample and label from conditional.??\\
Any dataset ( training or test ) is a random sample (campione casuale) in the
statistical sense $\longrightarrow$ so we can use all stastical tools to make inference.