\ell_D\left(\hat{h}\right) \leq min \,\,\ell_D\left( h \right) + \sqrt[]{\frac{2}{m}\,\ln\,\frac{2 \, H}{\delta}}\qquad\textit{ with prob. at least $1-\delta$}
$$
\\
Now we do it with tree predictors\\
\section{Tree predictors}
$$
X = \{ 0,1\}^d \longrightarrow\blue{Binary classification}
$$
$$
h : \{0, 1 \}^d \longrightarrow\blue{Binary classification H}1
$$
How big is this class?
\\Take the size of codomain power the domain $\longrightarrow$$|H| =2^{2^d}$\\
Can we have a tree predictor that predict every H in this class?
\\
For every $ h : \{0,1\}^d$$\longleftrightarrow$$\{-1,1\}\quad\exists T$\\\\
We can \bred{build a tree } such that \quad$h_T = h$
\textit{\# of binary trees with N nodes, called \bred{Catalan Number}}
$$
\subsection{Catalan Number}
*We are using a binomial *
$$
\frac{1}{N}\binom{2 \, N -2}{N-1}\quad\leq\quad\frac{1}{N}\,\left(e \,\frac{\left(2\, N -2 \right)}{N-1}\right)^{N-1} = \frac{1}{N}\,\left( 2 \, e \right)^{N-1}
$$
$$
\binom{N}{K}\quad\leq\quad\left( \frac{e\, n}{k}\right)^k \qquad\textit{ from Stirling approximation}
$$
Counting the number of tree structure: a binary tree with exactly N nodes.
Catalan counts this number. $\longrightarrow$\blue{but we need a quantity to interpret easily}. So we compute it in another way.
\\
Now we can rearrange everything.
\\
$$ | H _N |
\quad\leq\quad\blue{$
\frac{1}{N}$}\,\left(2\, e \right)^{N-1}\, H^M \,\bred{$2^{N-M}$}
\quad\leq\quad
\left( 2 \, e \, d \right)^N
$$
\qquad\qquad\qquad\qquad\qquad\qquad\qquad
\bred{$d \geq2$}\qquad\bred{$\leq\, d^{N-M}$}\\
where \blue{we ignore $
\frac{1}{N}$ since we are going to use the $\log$}
\\\\
ERM on $H_N \quad\hat{h}\quad$
$$\ell_D \left(\hat{h}\right)\,\leq\,\min_{\mathbf{h \,\in\, H_N}}\,\ell_D \left( h \right)+\sqrt[]{\frac{2}{m}\,\left(\bred{$ N \cdot\left(1+\ln\left(2\cdot d \right)\right)$}+\ln\frac{2}{\delta}\right)}
$$
\\
were \bred{$ N \cdot\left(1+\ln\left(2\cdot d \right)\right)$}\quad$=\quad\ln\left( H_N \right)
I can design $\sigma : H \longrightarrow\{0,1\}^*\quad istantaneous \ |\,\sigma(h)\,|$\\
$
\ln |H_N| = O\left(N \cdot\ln d\right)
$\\
\bred{number of bits i need \quad$=$\quad number of node in $h$}
\\\\
Even if i insist in istantaneous i do not lose ... -- MANCA PARTE --
\\
$$
| \,\sigma (h) \, | = O \left( N \cdot\ln d\right)
$$\\
Using this $\sigma$ and $w(h)=2^{-|\,\sigma(h)\,|}
$
$$
\ell_D\left(h\right) \,\leq\,\hat{\ell}_S \left( h \right) + \sqrt[]{\frac{1}{2 \, m}\cdot\left( \red{c}\cdot N \cdot\ln d + \ln\frac{2}{\delta}\right) }\qquad\textit{w. p. at least $1-\delta$}
$$
where \red{$c$} is a constant
\\
$$
\hat{h} = arg\min_{h\in H}\left( \hat{\ell}_S \left( h \right) + \sqrt[]{\frac{1}{2 \, m}\cdot\left( \red{c}\cdot N \cdot\ln d + \ln\frac{2}{\delta}\right) }\,\right)
$$
where \bred{$m >> N \cdot h \cdot\ln d$}
\\
If training set size is very small then you should not run this algorithm.