class: center, middle, inverse, title-slide # Stat Workshop – Extra Reading ## Advanced R for Bioinformatics. Visby, 2018. ### Bengt Sennblad ### 19 Juni, 2018 --- name: extra_notation ## Matrix notation for regression models Let's establish some convenient notation for the variables in regression as matrices and vectors<sup>.small[2]</sup>. Let `$$\begin{array}{ccc} Y=\left(\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array}\right), & X = \left(\begin{array}{cccc} 1&x_{1,1} & x_{1,2}&\ldots& x_{1,k}\\ 1& x_{2,1} & x_{2,2}&\ldots& x_{2,k}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&x_{n,1}& x_{n,2}&\ldots&x_{n,k} \end{array}\right), and & \boldsymbol{\beta} = \left(\begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_k \end{array}\right), \end{array}$$` Notice that we now have included the intercept `\(\beta_0\)` in `\(\boldsymbol{\beta}\)` and a leading column of 1's in `\(X\)`. Making use of matrix-vector multiplication<sup>.small[2]</sup>, this allows us to write the model `$$\begin{array}{rcl} y_1 &=& \beta_0 + \beta_1 x_{1,1}+\beta_2 x_{1,2}+\ldots+\beta_k x_{1,k}\\ y_2 &=& \beta_0 + \beta_1 x_{2,1}+\beta_2 x_{2,2}+\ldots+\beta_k x_{2,k}\\ &\vdots&\\ y_n &=& \beta_0 + \beta_1 x_{n,1}+\beta_2 x_{n,2}+\ldots+\beta_k x_{n,k}\\ \end{array}$$` more compactly as `$$Y=X\boldsymbol{\beta}.$$` .small[ <sup>1</sup> Here, we have chosen to have variables as columns and subjects as rows -- the opposite is possible and will then affect how we write the matrix multiplication]. .small[ <sup>2</sup> see next slide ] <!-- Moreover, for convience, we will frequently refer to the `\(i\)`th column of `\(X\)` as `\(X_i\)`. --> <!-- `$$X_i = \left(\begin{array}{c} 1\\x_{1,i}\\x_{2,i}\\ \vdots \\ x_{n,i} \end{array}\right)$$` --> --- name: matrixmult1 ## Matrix algebra * Matrices can be multiplied -- if their dimensions match. More precisely, for the multiplication `\(AB\)` (or more explicitly `\(A \times B\)`) the number of rows in `\(A\)` must match the number of columns in `\(B\)`. * Formally, let `\(A\)` be a `\(n \times m\)` matrix and `\(B\)` be a `\(m\times p\)` matrix, then `\(C=AB\)` is a `\(n\times p\)` matrix `$$\begin{array}{ccc} C = \left(\begin{array}{ccc} c_{1,1} & c_{1,2}&\ldots& c_{1,p}\\ c_{2,1} & c_{2,2}&\ldots& x_{2,p}\\ \vdots&\vdots&\ddots&\vdots\\ c_{n,1}& x_{n,2}&\ldots&x_{n,p} \end{array}\right), & where & c_{i,j} = \sum_{k=1}^{m} a_{i,k} \times b_{k,j} \end{array}$$` that is, you multiply `\(A\)` row `\(i\)` and `\(B\)`column `\(j\)`, element-wise, and sum them to get the corresponding `\(i,j\)` element in `\(C\)`. * Notice that vectors can be viewed as 1-dimensional matrices, so you can, e.g., multiply a `\((n,m)\)` matrix with a `\((n,1)\)` vector. This what was done in the `\(X\boldsymbol{\beta}\)` multiplication in the previous slide. Try to perform that multiplication and see if you get the expected result. * You can also multiply a `\((1,m)\)` row-vector with a `\((m,1)\)` column vector, the result is a single number (why?). * Notice that matrix multiplications, in general is not commutative, that is, `\(AB \neq BA\)`. --- name: matrixmult2 ## Matrix algebra (contd) ### Transpose * The transpose of a matrix, written `\(A^T\)` (sometimes `\(A'\)`) can be viewed as "flipping the matrix over along the diagonal". + Formally, the transpose of `\(A\)` is a matrix `\(A^T\)`, such that `\(\left(a^T_{i,j}\right) = \left(a_{j,i}\right)\)` + Notice that dimensions of the matrix becomes switched too, so that, e.g., the transpose of a column vector is a row vector. ### Identity matrix * The identity matrix, `\(I\)`, is a matrix with 1's along the diagonal and 0's otherwise + This means that `\(AI=A\)` (i.e., it corresponds with multiplying a number with 1). ### Inverse matrix * You can also perform a form of "matrix division" using the inverse matrix: \begin{equation}C=AB \Leftrightarrow CB^{-1}=A\end{equation} + Formally, the inverse matrix, `\(B^{-1}\)` of `\(B\)` is the matrix such that `\(BB{^-1}= I\)`, where `\(I\)` is the identity matrix. + The inverse matrix is often hard to identify and might not exist. --- name: SVD ### Singular value decomposition (SVD) * With the above matrix operations, we have touched upon the area in mathematics called *Linear algebra*. Without going into details, we will mention another Linear algebra concept that you might run into, so you have some intuition of what it means. * It is often possible to express a matrix `\(A\)` as a multiplication of three matrices `$$A = U\Sigma V$$` This *decomposition* into three matrices can simplify some operations, such as identifying the inverse matrix, and is often used, e.g., in dimensional reduction techniques (PCA, etc) * SVD is closely related to eigenvalues and eigenvectors of a matrix `\(A\)`, when `\(A\)` is square (i.e. has dimensions `\(k\times k\)` for some `\(k\)`) + Eigen-values and eigen-vectors provides a decomposition of `\(A\)` - Formally, an eigenvalue and its corresponding eigenvector are defined `\(Av=\lambda v \Leftrightarrow A=v^{-1}\lambda v\)` - Notice the structural similarity of the last equation to that of SVD + Eigenvalues and eigenvectors, similarly to SVD, has numerous applications in mathematics and statistics. --- name: extra_norm1 ## *Norms* <!-- * A concept that is tightly connected to regularization is the *norm* of a vector `\(V\)`. --> * A norm is a function on the vector `\(V=(v_1,\ldots, v_k)\)` that return a single number representing some kind of *length* of that vector. There are different types of norms, the most important ones for us are: + the `\(L_2\)` norm: `\(||V||_2=\sqrt{ \sum_{i=1}^k v_i^2 }\)` + the `\(L_1\)` norm: `\(||V||_1 = \sum_{v_i=1}^k |v_i|\)`, where `\(|v_i|\)` is the absolute value of `\(v_i\)` * Geometrically, a vector can be viewed as defining a point in a `\(n\)`-dimensional coordinate system (where `\(n\)` = the number of elements in the vector). + The norms of a vector then represent different measures of the length of the vector from its start in origo - `\(L_2\)` measures the Euclidean length - `\(L_1\)` measures the Manhattan length <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAfgAAADYCAIAAACIkWaqAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO3daVgT59oH8Dsshh0CqKgVLCCURStEQZHKUqiWIio1Vam4tG7FpZ5XW63Wbl64t7Wi12lVqm092lNUFNxaRVyrIhQQtaACgoggIISwCCTk/RAPpkAiYcmQ4f+7+JBMZua5eSb5Z/LMZMKRSqUEAADspcV0AQAA0LUQ9AAALIegBwBgOQQ9AADLIegBAFgOQQ8AwHIIegAAlkPQAwCwHIIeAIDlEPQAACyHoAcAYDkEPQAAyyHoAQBYDkEPAMByCHoAAJZD0AMAsByCHgCA5RD0AAAsh6AHAGA5BD0AAMsh6AEAWA5BDwDAcgh6AACWQ9B3Ant7+0WLFjHStJmZ2ZEjR7pPPV3qwoULgwcPtrGxUTRDVVUVh8M5c+aMOquSbQJZ08eOHVNbu13RIofDafl06g4kEomHh8ft27c7spLExMTQ0NDOKkmzIOi7nEQi6du376lTp5guhEQi0fz58/v06WNkZMTn8+Pi4piuSDXbtm2zs7PLyclpNn3kyJHLly/v+Po70j9GRkZSqTQ4OLjjZXTbFtut4xto586dNjY2zs7O8hMTEhJ0dHSUv6/Lz+Pn51dSUhIfH9+RSjQUgr4L1dfXZ2RkzJs37/Hjx0zXQkT0/vvvX7t27fLlyw8fPhw3btzEiRMvXbrEdFEqqKqqsrGx0dbW7qL1a3r/sFV1dfXnn3/+r3/9S35iTk7OlClTJBKJkgVbzrNkyZJly5Y1NjZ2Va3dlhQ6zM7ObuHChS2nL1y4sKmfT5482eqyxcXF4eHhFhYWhoaGHh4estlEIhER7dy508fHh8vl9u/f/4cffpDNf/fu3YCAAD09PSsrq8jISFNT09jY2LbU8/jxYy0traYyGhsb+/fvv2DBAvl5lLRbXFwcFhZmZmZmbm4eEhKSlZUlm05EW7ZscXd39/DwkC0eFRU1cuRILpfr7Ox85syZWbNmWVlZmZqaLl++XLbIyZMn3dzc9PX1ra2tFy9eLBKJWvZJy7b4fL6sJwcMGCA/s4uLi2w6n8+XFbBhwwZPT08ulztw4MA9e/bIZquurl60aFHv3r319fV9fHzOnz/fjv5RsglkTcfHx7exE1qtR0n/t+y0phYV9ZjyrdnyWSfbms2eTr6+vj4+Pk13jxw5wuFw8vLyFPXno0eP3nnnHSMjIwsLi7CwsNLSUvkN1MYnUrMO37t3r7W1tfwUkUjk6uq6evVqIjp9+nTLbaRonrq6On19fUWLsBiCvhMoCnqZ3NxcJUE/YcKE0aNHFxQUlJaWzpgxw8DAoL6+XvYS7du374ULF0Qi0RdffKGlpVVYWCgWix0dHd966638/Pz8/Hx/f/+Wr0xF9ch2TouKipqmODo6Llu2TH4eRe02NjZ6enoGBgbm5ORUVFSsWLGCx+M9ePBAKpUSkZWV1cWLF8VisWzxAQMGXL169cmTJ97e3kS0fPlyoVB47tw5DoeTmJhYWFiop6d38ODBurq6O3fuODs7r1y5Ur4GJW2NHTu21X729PSU/SOyAl566aXExMTKysrIyEgdHR3ZvxwcHPzmm2/m5uaWlpauX7++V69e2dnZqvaPVCpVtAmaBb3yTlBUj6L+b7XTmlpU1GOK1qboWSdtLeh3796tpaVVXFwsu/vuu+/6+fkpqr+xsXHEiBEhISHFxcV5eXkjRowQCATyG6iNT6RmfT516tRp06bJP0MmTJgwc+bMkpISRUGvZB5fX9+mt9ueA0HfCToS9EZGRkePHpXdvnbtGhEVFBTIXqJbt26VTa+oqCCi33///dChQ7q6uk2vunv37rU96Js5ceKEmZlZbm6u/ERF7cbHx2traxcUFMimNzY22tjYfPbZZ1KplIg2b94sv/jXX38tu7t27VoDA4OGhgbZXUtLy6ioKFme/vHHH4oKU9JWG4N+27ZtsulPnjyRtXX9+vVmIT506NDVq1er2j9SqVTRJmgW9Mo7QVE9ivq/1U5ralFRjylam1TBs07aWtBXVFTo6ent3LlTKpU+ffrUxMRkz549iuqPj4/ncDhNlcTHx+vp6dXV1TVtoDY+kZp55ZVXPv/886a7a9as8fT0fPr0qZKgVzLP/Pnzx4wZ02pDLKaj4kgPdDKRSFRdXZ2cnPzgwYMTJ04QkVQqlT3k5OQku2FoaEhENTU1qampNjY2ffr0kU23s7MzMTFRtcWqqqrNmzfHxcVdunRp0KBBLWdo2e7Nmzft7OwGDBggm87hcFxdXbOysmR37e3t5Rd3dHSU3dDR0bGystLRefYc09bWfvr06YgRI4KDg8eOHTts2DBfX9/x48f7+fnJL56WlqakrbZwcHCQ3dDX1yei6upq2eJWVlbys7m5ubW6uPL+aeMmUN4JV69eVVJPy/739fVV0mnKe6zl2kjps64ZU1PT4ODgw4cPz50799SpU2Kx+O233/7pp59arT89Pf2ll15qqiQ4OLi2tlZ+HuWlNnsiNSkpKTEzM5Pdjo2NjY6Ovn79OpfLlb2TtaR8HgsLi6KiolYXZDEcjGWSRCKZN29enz59Fi9efPjw4aZns4yWVvOt09DQ0Gyi8oNRLf3444/29vZCofDPP/9sGjxtpmW7tbW1urq68lNqamp4PF6ri3O53KbbHA6n2aO9evWKj49PT09/9913b9++HRgY+H//93/tbqtVLY/W6ujo9OrVSyKRyO/j7N27t+WyLftn7969nP85c+ZMGzeB8k5QXk/L/lfeacp7rOXalD/rWpo+fXpCQoJQKIyJiZk0aZKxsbGi+quqqppV0kz7Nm5DQ0PT7aNHjxYWFg4YMIDD4fTu3ZuIAgMDhw8fLj//C+dp2Ses1+P+4W7l+PHju3btun79+pUrV3755Zc333xT+fwuLi65ubmyT6NEdPv27erq6rY3t2TJklWrVv36669bt26V7e22kZOTU3Z2tmwkhIiePn2akpKiaI9YuaioqKFDhw4ZMmTZsmWnTp368ssvjx492kVtNXFxcamvr09NTZXdrampsbCw+Pnnn5vN1mr/zJo1qynLAgICOrgJVKqnifJOU7XHVH3WBQUFmZiYHDx4MD4+fsaMGUrqd3Z2LigokI0REdFvv/1mYmIiH9Pt27gvvfRSWVmZ7PbevXubNodsb/306dPJycny8yufp6KioukDWc+BoGeSbL/vwYMH9fX1SUlJn3zyCREpCY533nnH2tp6/vz5BQUF+fn5H3zwgfIdKHk3btzYvn37iRMnfH19Va1TIBD069dv7ty5Dx48KCoqWrhwoampqew1r6q33norKytrx44dlZWV+fn5Z8+eHTVqVAfb0tPTKykpUXLO3Guvvebj47N48eKcnJzi4uL58+dbWloKBAL5edrYPx3ZBCrVI095p6naY6o+63R1dQUCwapVq4yMjAICApTU/8477/Tv3//DDz8sKSnJyspas2ZNeHi4rq5u0wZq3xPJ29tbpbE75bKysjq436CJEPSdY8eOHZx/astSY8eO/fTTT8PDw83NzdesWbNt27Zhw4Y1Cz55XC73xIkT5eXldnZ2Y8aMmT59urW1dRvrkR2P4vP58hNnzZrVljq5XO7JkydramocHR0HDx5cVFR0+vRpPT29tizbjK2t7eHDh3fv3t2vXz9PT08bG5uoqKgOtjVz5szDhw97eHgomefgwYO2trbDhg1zcHAQCoWnT59u9pmmjf3T9k2g3Avrkae801TtMVWfdUQUHh7++PHjd999t2nQo9X6ZZ1TUFBgbW3t7+8fFBT09ddfk9wGat8Tyd/f/+LFi8rnISJvb+9mYzgticXipKSkF36IYR+OooMwAADdQW1trbW1dUxMjPIPW1lZWYsWLTp9+rSSeY4ePbp8+fLMzMyu+9pd94Q9egDo1vT19SMjI7du3ap8tlWrVn300UfK5/n222/XrVvX01KeEPQA0P3NnTu3rKwsLS1NyTyHDh164403lMxw6dIl2fGGzq5OA2DoBgCA5bBHDwDAcgh6AACWQ9ADALAcgh4AgOUQ9AAALIegBwBgOQQ9AADLIegBAFgOQQ8AwHIIegAAlkPQAwCwHIIeAIDlEPQAACyHoAcAYDkEPQAAyyHoAQBYDkEPAMByCHoAAJZD0AMAsJwO0wWoJisr68iRI0xXAQDQyTgczrx588zMzLpi5eoL+qysLEdHRyLKzMw8f/68oaFhUFCQubm5Sis5evRobGysr69vl5QIAMCQX3/9lc/nv/76612xcjUF/bZt21auXFlTUxMTEzN9+nQ3NzeJRLJ06dITJ054eHiotCofH58NGzZ0UZ0AAIxITk7uupWraYx+8+bNZ8+eJaLVq1cfOHDg6tWr169f3759+6JFi9RTAABAj8WRSqVqaMbAwODRo0empqY8Hq+goMDQ0JCIRCJR//79RSJRq4u8/PLL9+/fbzldS0tLIpF0abUAAGoWEBDwySefdNHQjZr26L28vCIjIyUSSUBAwK+//iqbuHfv3ldffVXRIrm5udIWNm7c2NjYqJ6aAQDYQU1j9NHR0aGhoTY2Ng4ODnPnzt26datYLC4uLpaN5wAAQNdRU9Db2NikpKQkJSUlJSWNGTOGiGxtbYODg1U96wYAAFSl1vPoPTw8VD3HBgAAOgjfjAUAYDkEPQAAyyHoAQBYDkEPAMByCHoAAJZD0AMAsByCHgCA5RD0AAAs131/eETRRc0AAEAl3XePXtFFzZiuCwBAw3TfoAcAgE6BoAcAYDkEPQAAyyHoAQBYDkEPAMByCHoAAJZD0AMAsBxjQb9y5UqmmgZgvQsXLoSHh/v7+0dERNy9e7cT18zj8TjQBc6fP9+Jm6kZNX0z9ty5c82mbNy4cdy4cUTk6+urnhoAeoidO3du2LBh1apVdnZ2V69e9fb2jouL8/T07JSVV1RUSKXSTlkVyONwOF23cjUFvUAgKCsrs7a2lp+4YMECIsrMzFRPDQA9wZMnT1atWpWUlGRra0tEfn5+rq6uc+bMycjIYLo0YAxHPW/OxcXFc+fOlUgku3fv7tevHxFxOC9oWsm1btpYc309ffQRPX2qerkAGuvhw4fpf/0V5OmpVVHq/qZ47sdDpVKpmZlZXl6emZlZx9f/wlcutA+Hwzlz5szrr7/eFStX0x5937594+Li9uzZ4+XltW7dumnTpr1wkdzc3JYTN23atGLFijY2WlpKP/9MuDoOsJ9YTAUFlJdHeXmud+++WVKim5qhNdTePNCLiCQSiUQi0dXVZbpKYIxar145e/Zsf3//2bNnx8bGqqdFAwOaN089TQGokVhMWVmUkvLsLz2drK2JzycBv87V12XGjJ/+8xvfc+TZvDIi+v7770eNGmVoaMh00cAYdV+m2MbGJiEhYfv27fX19WpuGkCDKUp2Pp8EAuLzSV9fNiOX6PuffgoNDZ0+a/bId+dPnTr1ypUrFy5cYLZ8YBYD16PncDiLFy9evHix+psG0BjKk93dnQwMFC0aEBCQmpr62+FYsVjs7+8fHR2N3fkervv+8AhAzyKRUGbm82S/cYMGDmxjsrfUv3//BQsWnM0rm4axS0DQAzCmWbKnppKpKXl70+jRJBCQmxthNxw6CYIeQI1ycujSpWfJnpZGJibP9tlXrCBvb+LxmK4P2AlBD9CVCguf77Nfvkx6es+TffRoMjdnur5OxuNwKhQ8ZEZUrvQE/KqqKmNj44yMDFdXV/npR48eXbly5f379wcNGrR27drJkyd3Xr09BYIeoFPJJ/uffxKX+yzZ582jvXvJwoLp+rpWBZGiLG/fF/yLiorCwsKio6ODg4NPnTo1ffr0oUOHOjg4tL/EHqn1oBeLxTo6eA8AaAP5ZL9yhXr1ep7se/aQpSXT9Wm2ixcvurq6Tp06lYgmT578xRdfXLx4EUGvqtbTvG/fviEhIaGhoYGBgXp6emquCaBbk0/2q1dJV/d5sv/4I/XuzXR9rCIQCAQCgex2dnZ2Tk6Ok5MTsyVpotaDPiYmJj4+ftmyZY8ePQoKChIIBEFBQQaqnN0FwB7yyX7tGunoPE/26Gjq04fp+nqE2NjYBQsWREREeHl5MV2L5nnB9YmysrKOHTt28ODBW7duTZky5YMPPnB3d1dPZR2/qFlhIY0YQQ8fdmZV0CMUFlJKiiQ9/S/7IY0VFcThEI/37M/cnDTkM65EKhU+Fb9p18nvQ8ovasbhcJSM0St/5So6GCsUCt97772kpKSoqKiJEye2p2hNwNhFzR48eHDu3Llz586lp6dbWVkZGBjMmTPntdde++6777qilGY6flEzgLaS32dPSiJtbeLzG7xfK3qV726sS6amTNfXTvrmGv8TchKJZNy4cdbW1rdv3zY2Nma6HE3VetCvXr36+PHjN27c4PP5EyZMWLdu3ZAhQ4hIJBJZWVmpJ+gBupB8sl+/Tlpaz0djdu0iKysiIrFEO69swEsYmVGBmeKza9p3ieRjx46JRKL//Oc/OD2kI1rvu5SUlPnz54eEhAwYMEB+up6eXlxcnFoKA+hU8smenEx1deTsTHw+zZhBO3dSv35M18cSys+UbwvZPqXMW2+95ezsfOvWLflrLO/atWvOnDkdbKWnaT3oT5061ep0XV3dLhpCAuhk5eV06xZdvvzsm6i1teTi8uy6MRs2kIsL0/VBc0ZGRq0O4m/atEn9xbAMPg0BW1RU0M2bz3fbCwuR7AAyCHrQWM2S/eFDcnUlPp8CAmjFCnJ2pq78tWUADYKgB80hFFJGBpIdQFUIeujGKivpxg0kO0AHIeihO2mW7AUFNGTI82R3ciItjT8xHBRR9IUpmYiICGdn50WLFqm/MBZQX9DfuXPn4sWLTk5O8t9g3r17N86U6tFEIkpPR7KzBofDI1J4oWKptLwd66ysrExISPjll1/Wr1/fkdp6MjUFfUxMTHh4uL29fU5OTkRExJYtW2TT586di6DvWZDsLNfpFyomf3//lJSUdhcEpLagX7Nmzf79+0NDQ7Ozs728vLy9vVl8zQr4h6oqSktTmOyvvELa2kyXCN1acnIyEfn6+jJdiAZTU9Dfv39/7NixRGRnZ/f9998vWbIkICDAyMhIySJKLmoG3Vp1NaWmUkqKOOPmpZCpjY2N1KsXDRtNXoFkYECGhv84glrQns/y6iGVkhaO9QIrqCnoBw0aFB8fL/v1gEmTJu3bt2/OnDn79u1TsgguaqYxGhrozp3nv5Z36xbZ2hKfL/YaXT3YcfRL5iT3/XXNwtXBOBKwgZqCfuPGjWFhYVu3br169SoR/fjjj2+88cawYcPU0zp0Mvlkl/3ItY3Ns4uCCQQ0YgRxuUREYgknr8zMGD9jAMAwNQX9hAkTsrOzL168KLtramp6+fLlY8eO/fnnn+opADqkjckOAN2S+k6vtLKyavpJMCLS0dGZOHEiDsl2U8qTffhwTfnxDVC7jl6ouNnVK48dO9YZVfV0+MIUEBGRWExZWc+TPT2drK2fJzufT/r6TJcIGqB9Z8rLKLp6pcy5c+favWZA0PdUSHaAHgNB32MoT3Z3d8KPvwOwFIKevSQSysx8nuw3btDAgUh2gB4IQc8izZI9NZVMTcnbm0aPJoGA3NzI0JDpEgGAAQh6TdYs2dPSyMTk2T77ihXk7U08HtMlAgDz8MU/TVNYSPHx9MUXNH489e5NgYEUE0M8Hq1YQfn5/3gUKQ9qZ2RqylHAyNRU+bJVVVUcDufmzZutPhoREbF9+/ZWH4qOjt66dWurD2VmZpqZPTutU09P7969e23+V9pEts60tDRLS8u2LxUbGxsZGdm5lSiHPfpur7Dw+T77n38Sl/tsn33ePNq7lywsmK4P4LnqyspDmYWtPvT2K/3bt07llykuLS399ttvmb285bBhw0pLS9s+/6RJk7Zs2SIQCBwcHLquKnndN+h77kXN/pfsDbf/Pr10tYSrRwOdadAQmjKHdHT+cUWwcjGVFzNX6ItIpb208ZEROkr5ZYo3b948efJkLpdLRNHR0ZGRkY8ePXJwcPjmm29ef/31ptns7e3r6uoGDx6cmprK4XAWLlyYlpbm5OS0bdu2UaNGZWZmjhs3TiAQ7Nix48KFC9OnTw8LC/vuu++I6Msvv5T92knLlTetc8+ePcuXLy8tLfX29n7jjTc+++wzIkpMTJwyZUphYeGtW7eaNUdEc+bMiYyM/Omnn7q6956RapSNGze2veaHD6X9+3dpOZ3k4UNpXJz088+lwcFSS0tpv37S4GDp55/Xnjh5/E5hvViiuX9iSSPTnQudTPkLkIgOZRa2+vfCV65IJCKijIyMVh/18fGJiopqOd3W1vbSpUtSqVQoFOrp6V2/fr2urm7fvn22trZSqfTvv/82NTWVzcnlcu/evVteXt67d++ff/5ZKBTu2rXL3Ny8pqbm77//JqKPP/64qqpK9k7wr3/9SyQS7du3j8vl1tbWtrrypnWmpqZaWFhIpdLt27e7u7vLHlq8eHFERESrzUml0vz8fGNj4/r6evmuO3PmjPIuajfscDFBfiS9Tx8aPpx27iQimjePbt16/mhgIGlp62prae6fNq7zC11JJBLl5OTIBkBMTExqa2t79+6dmJh47dq14uLWP+zGxMS4urqGh4ebmJjMmTNn4MCBCQkJRKSjo7N27VpDQ0Mi0tPTW79+vZGR0eTJk+vr64uKitq4coFAcOPGjYKCAiI6cuRIWFiYouYGDhzY0NCQmZnZRT3TTPcdumEV+XH2a9dIR+f5OPvu3dS3L9P1AWikJ0+eEJGxsTERVVZWBgUF5eXlDR8+3NHRUdEiubm5iYmJHLkh0KKiInt7e0NDw169esmmmJiYyMaCuFyulpaWWCxu48r79Onj5+cXFxc3YsQILS0tLy+v48ePt2xOdsPU1LSiQtHPLnYyBH3XkE/2pCTS1kayA3Q6MzMzDodTUVFhZWW1b9++urq6+/fva2trX7t2TdEpOlZWVkFBQcePH5ef+MI96zaunIimTp363//+t6CgYOrUqRwOp9XmZCorK83NzdvwX3YCBH0nkU/269dJS+t5su/aRVZWTNcHoA6GJiaKzq4xNDHp9OZMTU1dXV3v3LljZWXVq1ev2tra8vLyqqqqTz/9VCKRNDQ0yM+so6NTUlIyadKkzz777OTJk76+vpcuXZoyZUpaWtoLG2p15bq6urJ16stdGCo0NHTp0qV37949cuQIEbXanLW1dWFhoba2tr29faf3SaswRt9e8uPs/fqRqytt3Ejl5TRjBqWm/uNRpDz0GFVCoaLjgVVCYVvWMGTIkKZT74ODg184f0hIyB9//EFE4eHhDg4O1tbW48ePX7ZsmYODQ3h4eLM5vb29y8rKDh06tHLlSh6Pt3Tp0p9//tna2vqFrShauWyd8m8VZmZm/v7+BgYGQ4cOJaKBAwe22tzp06dDQkK46vohB45U8XVBuyHZTwm2sebCQhoxgh4+7KS2y8spJYUuXXq2215bSy4uz3bb+XxycemkZp57KpaczSsLsuvT6WsGaDcOp3uFRklJiZ+fX3Jysp5G/UaCl5dXdHS0k5NT0xQOh3PmzBn5U0I7EYZuFKuooJs3nw/IFBY+S3aBgDZs6IpkBwBV9e7de9WqVTt27Fi2bBnTtbRVXFycn5+ffMp3NQS9nGbJ/vAhuboSn08BAbRiBTk7/+PLSgDQPYSFhTFdgmpCQkJCQkLU2aJagz41NTUrK0skEvF4PDc3Nzs7O3W23gqhkDIykOwAwG5qCvrMzMzJkyffu3fP3t7e2Ni4urr63r177u7u+/fvb8uRkE5TWUk3biDZAaBHUVPQv//++xMmTFizZk3TAZOnT59+/PHHs2bNOnv2bNe126tXY8Plq5SWRunplJZGhYXk7EzDhtHYcbRiJTk4kJbceUeNUqJudJSpobEbFQMAmktNB9D19PQePXrE++eFc4VCoZWVVW1tbauLKLqomZaWlkQiaUujT4qfHrtbxjOsJR0d0tYmHZ1/xLomMNLV9rNR4fKnAF2tu511wxpsOOtm8ODBBw4ciIiIkJ94/Pjxl19+WdEiubm5LSdu2rSprKysjY2a99Wb0XeASnUCALCPmoL++++/nzhx4rZt29zc3ExNTWtrazMyMnJzc2NjY9VTAABAj6W+T2FVVVXHjh1LS0sTCoU8Hs/Z2XnChAmyqxG13Y4dO5YvXy7/hWPlysvLVa8UAIABV65cGTlyZFesWcOG2xobG4Vt+yK1jLm5uez6dhoK9TML9TOrR9XP4XCafvWw02lY0KtK0w8coX5moX5mof7OomFnoQAAgKoQ9AAALIegBwBgOQQ9AADLsTzoBw0axHQJHYL6mYX6mYX6O0t3OSgMAABdhOV79AAAgKAHAGA5BD0AAMsh6AEAWA5BDwDAcgh6AACWQ9ADALAcgh4AgOUQ9AAALIegBwBgOdYG/eXLl4cMGaKvrz969OjMzEymy2mniIiI7du3M11Fexw9etTJyUlfX9/JyengwYNMl6OyH374YeDAgYaGhqNHj05JSWG6nHYqLi62tLQ8d+4c04WoLDQ0lPM/lpaWTJejsry8vMDAQENDQ2tr623btjFdDpGUjUQikaWlZVRU1JMnTz755BNXV1emK1KZUCg8fPiwkZFRVFQU07Wo7NGjRwYGBgcOHBCJRDExMVwuNysri+miVJCRkWFkZPT777+LRKJPP/3U1taW6Yraafz48RwOJzExkelCVPbqq6/m5OQwXUX78fn8jz76qLy8/MqVK/r6+pmZmczWw86g379/v6Ojo+x2bW0tl8v966+/mC1JVXw+X/ZOrIlB/9tvv3l4eDTddXFx2b17N4P1qOq7774LDQ2V3c7PzyeimpoaZktqhx9//DEoKMjGxkYTg97CwkIsFjNdRTtdu3atf//+3ap+dg7dpKenu7m5yW7r6enZ29tnZWUxW5KqkpOTpVKpj48P04W0hykw/eoAAARfSURBVEAguHbtmux2dnZ2Tk6Ok5MTsyWpZMmSJYcOHSKiioqKf//73+7u7vr6+kwXpZr8/Pyvvvpq9+7dTBfSHiUlJQ0NDWPGjDEyMho+fPjFixeZrkg1SUlJjo6O06ZN4/F4jo6Ohw8fZroilo7RC4VCU1PTprvGxsYikYjBenqs2NhYLy+viIgILy8vpmtR2ZEjR3g83vr16+fMmcN0LaqRSqWzZ89et25dv379mK6lPXJycoho7dq1RUVF06ZNCw4OfvjwIdNFqaC0tDQxMTEwMPDBgweRkZFhYWGMHyZkZ9DzeLza2tqmu9XV1Twej8F6eiChUPj2228vWbLkhx9+2LJlC9PltMfEiRNramri4uKWLl3a9AFFI2zfvt3c3HzatGlMF9JOnp6eQqHQ39/fyMho2bJl/fr107jjya6urnPnzjUyMpo8efKwYcPOnz/PbD3sDHpnZ+eMjAzZ7bq6uuzs7KaRHFADiUQybtw4HR2d27dvT5w4kelyVDZz5swVK1YQkb6+/vjx4+3t7bOzs5kuSgUJCQkHDx6UnbKSl5fn5+e3cuVKpotqv4aGBmNjY6arUIGtra1YLG66K5FIDA0NGayHiKVn3VRWVpqZme3fv18oFH744YejR49muqJ28vHx0cSDsUeOHHFxcWloaGC6kHY6cODAgAEDkpOTq6ur9+3bZ2xsnJeXx3RR7eTo6KhxB2O3bt06dOjQrKwskUi0efNmKyurqqoqpotSQVlZmZmZ2Y4dO0Qi0YEDB3g8XmlpKbMl6TD8PtM1jI2NDx8+vHDhwtmzZ48aNWr//v1MV9SzXL58+datW7q6uk1Tdu3apUEj3VOmTLl7925oaGhpaamLi0tsbKy1tTXTRfUgERERubm5Y8aMqampcXNz+/3335nfI1aFubl5QkLCokWLPv74Yycnp+PHj1tYWDBbEn4zFgCA5dg5Rg8AAE0Q9AAALIegBwBgOQQ9AADLIegBAFgOQQ8AwHIIegAAlkPQAwCwHIIeAIDlEPQAACyHoAcAYDkEPQAAyyHoAQBYDkEPAMByCHoAAJZD0AMAsByCHgCA5RD0AAAsh6AHAGA5BD0AAMsh6AEAWA5BD9DcV1999fLLL9fW1hJRaGjo9OnTma4IoEM4UqmU6RoAupeGhgZ3d/eQkBAvL6+ZM2fevn27T58+TBcF0H4IeoBWXL161d/f39zcfNOmTWFhYUyXA9AhCHqA1nl4eNy/f7+wsFBHR4fpWgA6BGP0AK04ePBgQUGBiYnJN998w3QtAB2FPXqA5srLy52dnb/++mtLS8uJEydmZGTY2dkxXRRA+yHoAZp77733cnJyzp07R0TTpk17/PhxQkIC00UBtB+CHgCA5TBGDwDAcgh6AACWQ9ADALAcgh4AgOUQ9AAALIegBwBgOQQ9AADLIegBAFgOQQ8AwHIIegAAlkPQAwCwHIIeAIDlEPQAACyHoAcAYDkEPQAAyyHoAQBY7v8BJFbTLwnQm6gAAAAASUVORK5CYII=" style="display: block; margin: auto auto auto 0;" /> --- name: extra_norm2 ## *Norms* (cont'd) * Uses of norms + The method of least-squares used in regression analysis builds on an `\(L_2\)` norm. - the least squares method minimizes the sum of the squared residuals over the `\(N\)` individuals, and can be expressed using a squared `\(L2\)`-norm, i.e., `\(\min_{\boldsymbol{\beta}} \left\{\sum_{i=1}^N (Y_{\cdot,i}-\boldsymbol{\beta}X_{\cdot,i})^2\right\} \equiv \min_{\boldsymbol{\beta}}\left\{ ||(Y-X\boldsymbol{\beta})||_2^2\right\}\)` + Norms are used abundantly in regularization notation - The regularization term in our very simple `\(pL\)` toy example, `\(\#(V\neq 0)\)` is the *cardinality* of the vector of non-zero elements in `\(V\)`, but could also be viewed as a norm of the boolean vector `\(\left(v_1\neq0, \ldots, v_k\neq 0\right)\)` (the `\(L_1\)` and `\(L_2\)` are, in this case, equivalent). - The regularization term in the general `\(pL\)` toy example in , `\(||\beta-m||_2^2\)` is a squared `\(L_2\)` norm - The regularization term in Lasso is a `\(L_1\)` norm, while in ridge regression a `\(L_2\)` norm is used. - Many feature selection methods has regularization terms that boil down to some type of norm. <!-- * Let --> <!-- `$$(V\neq 0) = \Big(I(v_1\neq 0), I(v_2\neq 0), \ldots, I(v_k\neq 0)\Big),$$` --> <!-- where `\(I(x)\)` is an *indicator function* that takes the value `\(1\)` if the expression `\(x\)` is true and `\(0\)` otherwise. --> <!-- * Then, `\(\#(V\neq 0)\)` (i.e., the *cardinality* of the vector `\((V\neq 0)\)`) that was used in our simplest toy example can be viewed as either a `\(L_1\)` or a `\(L_2\)` norm of `\((V\neq 0)\)`. --> <!-- * That is --> <!-- `$$\#(V\neq 0) = \sum_{i=1}^k I(v_i\neq 0) = ||(V\neq 0)||_1 = ||(V\neq 0)||_2$$` --> --- name: report ## Session * This presentation was created in RStudio using [`remarkjs`](https://github.com/gnab/remark) framework through R package [`xaringan`](https://github.com/yihui/xaringan). * For R Markdown, see <http://rmarkdown.rstudio.com> * For R Markdown presentations, see <https://rmarkdown.rstudio.com/lesson-11.html> ```r R.version ``` ``` ## _ ## platform x86_64-apple-darwin15.6.0 ## arch x86_64 ## os darwin15.6.0 ## system x86_64, darwin15.6.0 ## status ## major 3 ## minor 4.4 ## year 2018 ## month 03 ## day 15 ## svn rev 74408 ## language R ## version.string R version 3.4.4 (2018-03-15) ## nickname Someone to Lean On ``` --- name: end-slide class: end-slide # Thank you