$$w \in Q^D \textbf{minimising the number of indices} t =1,...m \ s.t. \
h_t w^Tx_t \leq 0
$$
$Opt(S)$ is the smallest number of mislcassified example in S by any linear classifier in $H_D$
\\
where $\frac{Opt(S)}{m}$ is training error of $ERM$
\\\\
\bred{Theorem} : if $P \not\equiv NP$, then $\forall c > 0$ there are no polytime algorithms (with r. t. the input size $\Theta(m_d)$) that approximately solve every istance $S$ of MinDisOpt with a number of mistakes bounded by $C \cdot Opt(S)$.\\
If I am able to approximate it correclty this approximation will grow with the size of the dataset.
\\\\
$$\forall A \ \textbf{(polytime)}\ \ and \ \ \forall C \quad\exists S \qquad\hat{\ell}_S\left(A\left(S\right)\right)\geq c \cdot\hat{\ell}_S\left(\hat{h}_S \right)\ \textbf{(where $\hat{h}_S$ is $ERM$)}
$$
$$
Opt(S) = \hat{\ell}_S(\hat{h}_S)
$$
\\
This is not related with free lunch theorem (information we need to get base error for some learning problem). Free lunch: we need arbitrarirally information to get such error.
Here is we need a lot of computation to approximate the $ERM$.
\\\\
Assume $Opt(S)=0$$ERM$ has zero training error on $S$\\
$\exists U \in\barra{R}^d$\ s.t. \ $\forall t =1 , ...m$\qquad$y_tU^Tx_t > 0$\qquad\bred{$S$ is linearly separable}\\
\textbf{When $Opt(S)=0$ is we can implememtn $ERM$ efficiently using LP (Linear programming).}\\They may overfitting since a lot of bias. When this condition of Opt is no satisfy we cannot do it efficiently.
LP algorithm can be complicated so we figure out another family of algorithm.
\section{The Perception Algorithm}
This came from late '50s and was designed for psicology but have a general utility in othe fields.
\\\\
Perception Algorithm\\
Input : training set $S =\{(x_t,y_t) ...(x_m, y_m)\}$\qquad$x_t \in\barra{R}^d \qquad y_t \in\{-1, +1\}
$
Init: $w =(0,...0)$\\
Repcat\\
\quad read next $(x_t,y_t)$\\
If $y_t w^T x_t \leq$ then $w \longleftarrow w +y_t x_t$\\
Until margin greater than 0 $\gamma(w) > 0$ // w separates $S$\\
Output $w$
\\\\
We know that $\gamma(w)=\min_t y_t w^T x_t \leq0$
The question is, will it terminate if $S$ is linearly separable?
\\
If $y_t w^T x_t \leq0$, then $w \longleftarrow w + y_t x_t$\\