課程名稱︰機器學習特論
課程性質︰資工系選修
課程教師︰林智仁
開課學院:電機資訊學院
開課系所︰資訊工程學系
考試日期(年月日)︰2018/06/26
考試時限(分鐘):170
試題 :
● Please give details of your answer. A direct answer without explanation is
not counted.
● Your answers must be in English.
● Please carefully read problem statements.
● During the exam you are not allowed to borrow others' class notes.
● Try to work on easier questions first.
● Exam time: 170 mimutes.
Problem 1 (30 pts)
Consider the following two problems from slide 8-10:
\begin{equation}\label{eq1}
\begin{aligned}
\max_{\bm\lambda\in\mathbb R^N,\bm\mu\in\mathbb R^M}
&&&\bm1^T\bm\lambda+\bm1^T\bm\mu\\
\text{subject to}&&&2\left\|\sum_{i=1}^N\lambda_i\bm x_i
-\sum_{i=1}^M\mu_i\bm y_i\right\|_2\le1\\
&&&\bm1^T\bm\lambda=\bm1^T\bm\mu,\bm\lambda\succeq\bm0,\bm\mu\succeq\bm0,
\end{aligned}
\end{equation}
\begin{equation}\label{eq2}
\begin{aligned}
\min_{t\in\mathbb R,\bm\theta\in\mathbb R^N,\bm\gamma\in\mathbb R^M}&&&t\\
\text{subject to}&&&\left\|\sum_{i=1}^N\theta_i\bm x_i
=\sum_{i=1}^M\gamma_i\bm y_i\right\|_2\le t\\
&&&\bm\theta\succeq\bm0,\bm1^T\bm\theta=1,\bm\gamma\succeq\bm0,
\bm1^T\bm\gamma=1.
\end{aligned}
\end{equation}
Assume (\ref{eq1}) has an optimal solution
\begin{equation}\label{eq3}
(\bm\lambda^*,\bm\mu^*)\ne\bm0.
\end{equation}
(a) (10 pts) Formally prove that
\[\bm\lambda^*\ne\bm0\]
\[\bm\mu^*\ne\bm0.]
Note that in any step of the proof you must specify the condition or the
property of (\ref{eq1}) used.
(b) (10 pts) Prove that under (\ref{eq3}), there is no feasible solution of
(\ref{eq2}) with t = 0.
(c) (10 pts) Generate a feasible solution for (\ref{eq2}) and formally prove
that it is an optimal solution of (\ref{eq2}).
Problem 2 (30 pts)
Note that (\ref{eq1}) is the dual problem of the following primal problem
\begin{equation}\label{eq4}
\begin{aligned}
\min_{\bm a,b}&&&\frac12\|\bm a\|_2\\
\text{subject to}&&&\bm a^T\bm x_i+b\ge1,i=1,\ldots,N\\
&&&\bm a^T\bm y_i+b\le-1,i=1,\ldots,M.
\end{aligned}
\end{equation}
After changing to use our notation, we know that if we take the square norm of
the objective function to have
\[\begin{aligned}
\min_{\bm x,b}&&&\frac12\|\bm w\|_2\\
\text{subject to}&&&y_i(\bm w^T\bm x_i+b)\ge1\ \forall i,
\end{aligned}\]
then the dual is
\begin{equation}\label{eq5}
\begin{aligned}
\min_{\bm\alpha}&&&\frac12\bm\alpha^TQ\bm\alpha-\bm1^T\bm\alpha\\
\text{subject to}&&&\bm\alpha\succeq\bm0,\bm y^T\bm\alpha=0.
\end{aligned}
\end{equation}
Assume that (\ref{eq5}) has an optimal solution
\begin{equation}\label{eq6}
\bm\alpha^*\ne\bm0.
\end{equation}
Through the following sub-problems we aim to show that in fact (\ref{eq5}) is
equivalent to the dual problem (\ref{eq2}) of (\ref{eq4}).
(a) (10 pts) Prove that under (\ref{eq6}), there is no feasible α ≠ 0 such
that
\[\bm\alpha^TQ\bm\alpha = 0.\]
(b) (10 pts) Assume that
\[y_i=\begin{cases}
1 & \text{if }i \in \{1, \ldots, N\}\\
-1 & \text{if }i \in \{N+1, \ldots, M\}
\end{cases}\]
Since $\bm y^T\bm\alpha = 0$ implies $\sum_{i=1}^N \alpha_i = \sum_{i=N+1}^M
\alpha_i$, we can introduce a new variable λ and rewrite $\bm y^T\bm\alpha
= 0$ as two constraints
\[\sum_{i=1}^N \alpha_i = \lambda, \sum_{i=N+1}^M \alpha_i = \lambda\]
From (\ref{eq6}), $\bm\alpha^*$ is an optimal solution of both (\ref{eq5})
and the following optimization problem
\[\begin{aligned}
\min_{\bm\alpha}&&&\frac12\bm\alpha^TQ\bm\alpha-\bm1^T\bm\alpha\\
\text{subject to}&&&\bm\alpha\succ\bm0,\bm y^T\bm\alpha=0.
\end{aligned}\]
Then we can define
\[\bm\beta = \frac{\bm\alpha}\lambda.\]
and the dual problem is rewritten as
\[\begin{aligned}
\min_{\bm\beta,\lambda}&&&\frac12\lambda^2\bm\beta^TQ\bm\beta-2\lambda\\
\text{subject to}&&&\bm\beta\succeq\bm0,\sum_{i=1}^N\beta_i=1
\text{ and }\sum_{i=N+1}^M\beta_i=1.
\end{aligned}\]
This is an optimization problem with variables β and λ. Can you eliminate
the variable λ to get a new equivalent convex optimization problem of β?
(c) (10 pts) Show that the optimization problem obtained in (b) can be converted
to (\ref{eq2}) in problem 1.
Problem 3 (10 pts)
Consider the following function
\[f(x) = e^x + e^{-x}, x \in \mathbb R.\]
(a) (5 pts) Prove that this function is strongly convex by showing that there
exists an m > 0 such that
f''(x) ≧ m, ∀x.
Find the largest possible m.
(b) (5 pts) Suppose $x^* \in \mathbb R$ minimizes f(x), directly prove that
\[f(x)-f(x^*) \le \frac1{2m}\|f'(x)\|^2.\]
Here m is the largest possible one obtained in (a).
Problem 4 (30 pts)
Consider the following three (label, feature-vector) pairs:
\[y_1 = -1, \bm x_1 = [0, 0]^T\]
\[y_2 = 1, \bm x_2 = [1, 0]^T\]
\[y_3 = 1, \bm x_3 = [0, 1]^T\]
That is, we have
https://i.imgur.com/ff9A3wP.png
\begin{tikzpicture}
\draw[->] (-1, 0)