class: center, middle, inverse, title-slide # Stat Workshop – Extra Reading ## Advanced R for Bioinformatics. Visby, 2018. ### Bengt Sennblad ### 19 Juni, 2018 --- name: extra_notation ## Matrix notation for regression models Let's establish some convenient notation for the variables in regression as matrices and vectors<sup>.small[2]</sup>. Let `$$\begin{array}{ccc} Y=\left(\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array}\right), & X = \left(\begin{array}{cccc} 1&x_{1,1} & x_{1,2}&\ldots& x_{1,k}\\ 1& x_{2,1} & x_{2,2}&\ldots& x_{2,k}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&x_{n,1}& x_{n,2}&\ldots&x_{n,k} \end{array}\right), and & \boldsymbol{\beta} = \left(\begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_k \end{array}\right), \end{array}$$` Notice that we now have included the intercept `\(\beta_0\)` in `\(\boldsymbol{\beta}\)` and a leading column of 1's in `\(X\)`. Making use of matrix-vector multiplication<sup>.small[2]</sup>, this allows us to write the model `$$\begin{array}{rcl} y_1 &=& \beta_0 + \beta_1 x_{1,1}+\beta_2 x_{1,2}+\ldots+\beta_k x_{1,k}\\ y_2 &=& \beta_0 + \beta_1 x_{2,1}+\beta_2 x_{2,2}+\ldots+\beta_k x_{2,k}\\ &\vdots&\\ y_n &=& \beta_0 + \beta_1 x_{n,1}+\beta_2 x_{n,2}+\ldots+\beta_k x_{n,k}\\ \end{array}$$` more compactly as `$$Y=X\boldsymbol{\beta}.$$` .small[ <sup>1</sup> Here, we have chosen to have variables as columns and subjects as rows -- the opposite is possible and will then affect how we write the matrix multiplication]. .small[ <sup>2</sup> see next slide ] <!-- Moreover, for convience, we will frequently refer to the `\(i\)`th column of `\(X\)` as `\(X_i\)`. --> <!-- `$$X_i = \left(\begin{array}{c} 1\\x_{1,i}\\x_{2,i}\\ \vdots \\ x_{n,i} \end{array}\right)$$` --> --- name: matrixmult1 ## Matrix algebra * Matrices can be multiplied -- if their dimensions match. More precisely, for the multiplication `\(AB\)` (or more explicitly `\(A \times B\)`) the number of rows in `\(A\)` must match the number of columns in `\(B\)`. * Formally, let `\(A\)` be a `\(n \times m\)` matrix and `\(B\)` be a `\(m\times p\)` matrix, then `\(C=AB\)` is a `\(n\times p\)` matrix `$$\begin{array}{ccc} C = \left(\begin{array}{ccc} c_{1,1} & c_{1,2}&\ldots& c_{1,p}\\ c_{2,1} & c_{2,2}&\ldots& x_{2,p}\\ \vdots&\vdots&\ddots&\vdots\\ c_{n,1}& x_{n,2}&\ldots&x_{n,p} \end{array}\right), & where & c_{i,j} = \sum_{k=1}^{m} a_{i,k} \times b_{k,j} \end{array}$$` that is, you multiply `\(A\)` row `\(i\)` and `\(B\)`column `\(j\)`, element-wise, and sum them to get the corresponding `\(i,j\)` element in `\(C\)`. * Notice that vectors can be viewed as 1-dimensional matrices, so you can, e.g., multiply a `\((n,m)\)` matrix with a `\((n,1)\)` vector. This what was done in the `\(X\boldsymbol{\beta}\)` multiplication in the previous slide. Try to perform that multiplication and see if you get the expected result. * You can also multiply a `\((1,m)\)` row-vector with a `\((m,1)\)` column vector, the result is a single number (why?). * Notice that matrix multiplications, in general is not commutative, that is, `\(AB \neq BA\)`. --- name: matrixmult2 ## Matrix algebra (contd) ### Transpose * The transpose of a matrix, written `\(A^T\)` (sometimes `\(A'\)`) can be viewed as "flipping the matrix over along the diagonal". + Formally, the transpose of `\(A\)` is a matrix `\(A^T\)`, such that `\(\left(a^T_{i,j}\right) = \left(a_{j,i}\right)\)` + Notice that dimensions of the matrix becomes switched too, so that, e.g., the transpose of a column vector is a row vector. ### Identity matrix * The identity matrix, `\(I\)`, is a matrix with 1's along the diagonal and 0's otherwise + This means that `\(AI=A\)` (i.e., it corresponds with multiplying a number with 1). ### Inverse matrix * You can also perform a form of "matrix division" using the inverse matrix: \begin{equation}C=AB \Leftrightarrow CB^{-1}=A\end{equation} + Formally, the inverse matrix, `\(B^{-1}\)` of `\(B\)` is the matrix such that `\(BB{^-1}= I\)`, where `\(I\)` is the identity matrix. + The inverse matrix is often hard to identify and might not exist. --- name: SVD ### Singular value decomposition (SVD) * With the above matrix operations, we have touched upon the area in mathematics called *Linear algebra*. Without going into details, we will mention another Linear algebra concept that you might run into, so you have some intuition of what it means. * It is often possible to express a matrix `\(A\)` as a multiplication of three matrices `$$A = U\Sigma V$$` This *decomposition* into three matrices can simplify some operations, such as identifying the inverse matrix, and is often used, e.g., in dimensional reduction techniques (PCA, etc) * SVD is closely related to eigenvalues and eigenvectors of a matrix `\(A\)`, when `\(A\)` is square (i.e. has dimensions `\(k\times k\)` for some `\(k\)`) + Eigen-values and eigen-vectors provides a decomposition of `\(A\)` - Formally, an eigenvalue and its corresponding eigenvector are defined `\(Av=\lambda v \Leftrightarrow A=v^{-1}\lambda v\)` - Notice the structural similarity of the last equation to that of SVD + Eigenvalues and eigenvectors, similarly to SVD, has numerous applications in mathematics and statistics. --- name: extra_norm1 ## *Norms* <!-- * A concept that is tightly connected to regularization is the *norm* of a vector `\(V\)`. --> * A norm is a function on the vector `\(V=(v_1,\ldots, v_k)\)` that return a single number representing some kind of *length* of that vector. There are different types of norms, the most important ones for us are: + the `\(L_2\)` norm: `\(||V||_2=\sqrt{ \sum_{i=1}^k v_i^2 }\)` + the `\(L_1\)` norm: `\(||V||_1 = \sum_{v_i=1}^k |v_i|\)`, where `\(|v_i|\)` is the absolute value of `\(v_i\)` * Geometrically, a vector can be viewed as defining a point in a `\(n\)`-dimensional coordinate system (where `\(n\)` = the number of elements in the vector). + The norms of a vector then represent different measures of the length of the vector from its start in origo - `\(L_2\)` measures the Euclidean length - `\(L_1\)` measures the Manhattan length <img src="" style="display: block; margin: auto auto auto 0;" /> --- name: extra_norm2 ## *Norms* (cont'd) * Uses of norms + The method of least-squares used in regression analysis builds on an `\(L_2\)` norm. - the least squares method minimizes the sum of the squared residuals over the `\(N\)` individuals, and can be expressed using a squared `\(L2\)`-norm, i.e., `\(\min_{\boldsymbol{\beta}} \left\{\sum_{i=1}^N (Y_{\cdot,i}-\boldsymbol{\beta}X_{\cdot,i})^2\right\} \equiv \min_{\boldsymbol{\beta}}\left\{ ||(Y-X\boldsymbol{\beta})||_2^2\right\}\)` + Norms are used abundantly in regularization notation - The regularization term in our very simple `\(pL\)` toy example, `\(\#(V\neq 0)\)` is the *cardinality* of the vector of non-zero elements in `\(V\)`, but could also be viewed as a norm of the boolean vector `\(\left(v_1\neq0, \ldots, v_k\neq 0\right)\)` (the `\(L_1\)` and `\(L_2\)` are, in this case, equivalent). - The regularization term in the general `\(pL\)` toy example in , `\(||\beta-m||_2^2\)` is a squared `\(L_2\)` norm - The regularization term in Lasso is a `\(L_1\)` norm, while in ridge regression a `\(L_2\)` norm is used. - Many feature selection methods has regularization terms that boil down to some type of norm. <!-- * Let --> <!-- `$$(V\neq 0) = \Big(I(v_1\neq 0), I(v_2\neq 0), \ldots, I(v_k\neq 0)\Big),$$` --> <!-- where `\(I(x)\)` is an *indicator function* that takes the value `\(1\)` if the expression `\(x\)` is true and `\(0\)` otherwise. --> <!-- * Then, `\(\#(V\neq 0)\)` (i.e., the *cardinality* of the vector `\((V\neq 0)\)`) that was used in our simplest toy example can be viewed as either a `\(L_1\)` or a `\(L_2\)` norm of `\((V\neq 0)\)`. --> <!-- * That is --> <!-- `$$\#(V\neq 0) = \sum_{i=1}^k I(v_i\neq 0) = ||(V\neq 0)||_1 = ||(V\neq 0)||_2$$` --> --- name: report ## Session * This presentation was created in RStudio using [`remarkjs`](https://github.com/gnab/remark) framework through R package [`xaringan`](https://github.com/yihui/xaringan). * For R Markdown, see <http://rmarkdown.rstudio.com> * For R Markdown presentations, see <https://rmarkdown.rstudio.com/lesson-11.html> ```r R.version ``` ``` ## _ ## platform x86_64-apple-darwin15.6.0 ## arch x86_64 ## os darwin15.6.0 ## system x86_64, darwin15.6.0 ## status ## major 3 ## minor 4.4 ## year 2018 ## month 03 ## day 15 ## svn rev 74408 ## language R ## version.string R version 3.4.4 (2018-03-15) ## nickname Someone to Lean On ``` --- name: end-slide class: end-slide # Thank you