**(C) 2018-2019 by Damir Cavar**

**Version:** 0.2, September 2019

**Download:** This and various other Jupyter notebooks are available from my GitHub repo.

This is a tutorial related to the L665 course on Machine Learning for NLP focusing on Deep Learning, Spring 2018 at Indiana University.

The following material is based on *Linear Algebra Review and Reference* by Zico Kolter (updated by Chuong Do) from September 30, 2015. This means, many passages are literally copied, many are rewritten. I do not mark sections that are added or different. Consider this notebook a extended annotation of Kolter's (and Do's) notes. See also James E. Gentle (2017) Matrix Algebra: Theory, Computations and Applications in Statistics. Second edition. Springer. Another good resource is Philip N. Klein (2013) Coding the Matrix: Linear Algebra through Applications to Computer Science, Newtonian Press.

For an alternative tutorial on that topic see also the HMC Mathematics Online Tutorial on Eigenvalues and Eigenvestors.

See for the introduction to Linear Algebra the Notebook "Linear Algebra".

**Introduction in Philip Klein's Coding the Matrix:** Chapter 12

Assume, we have two interest bearing accounts. The first gives an interest rate of 5%, the second a 3% interest, with annual compound.

Assume that after $t$ years the amounts in the two accounts are represented by a 2-vector:

$x^{(t)} = \begin{bmatrix} amount in Account 1 \\[0.3em] amount in Account 2 \end{bmatrix}$

The growth of the amounts in one year can be described in a matrix:

$x^{(t+1)} = \begin{bmatrix} a_{11} & a_{12} \\[0.3em] a_{21} & a_{22} \end{bmatrix} x^{(t)}$

Given the specification of the interest rate above, this simple case gives us:

$x^{(t+1)} = \begin{bmatrix} 1.05 & 0 \\[0.3em] 0 & 1.03 \end{bmatrix} x^{(t)}$

Let $A$ denote the matrix: $\begin{bmatrix} 1.05 & 0 \\[0.3em] 0 & 1.03 \end{bmatrix}$

$A$ is a diagonal.

If we initially put \$ 100 on each account, we can compute the result of the accummulated interest after a year as:

In [10]:

```
import numpy as np
x = np.array([[100],
[100]])
A = np.array([[1.05, 0],
[0, 1.03]])
A.dot(x)
```

Out[10]:

After two years the accounts would be:

In [11]:

```
A.dot(A.dot(x))
```

Out[11]:

If we might want to know how $x^{(100)}$ compares to $x^{(0)}$, we could iterate over:

$\begin{align} x^{(100)} & = A x^{(99)} \\ & = A(Ax^{(98)}) \\ & = A(A(Ax^{(97)})) \\ & \vdots \\ & = \underbrace{A \cdot A \dots A}_\text{100 times} \ x^{(0)} \end{align}$

We can also write the product as $A^{100}$.

Note that $A$ is a diagonal, thus the entries of $A^{100}$ are $1.05^{100}$ and $1.03^{100}$:

$A^{100} = \begin{bmatrix} 131.50125784630401 & 0 \\[0.3em] 0 & 19.218631980856298 \end{bmatrix}$

What we can see is that account 1 dominates account 2, account 2 becoming less and less relevant over time.

Now consider the definition below:

**Basic definition:**

Given $A \in \mathbb{R}^{n\times n}$, $\lambda$ is the **eigenvalue** of $A$, if there is a non-zero vector $x$, the corresponding **eigenvector**, if the following is true:

$Ax = \lambda x, x \neq 0$

Formally, given a square matrix $A \in \mathbb{R}^{n\times n}$, we say that $\lambda \in \mathbb{C}$ is an **eigenvalue** of $A$ and $x \in \mathbb{C}^n$ is the corresponding **eigenvector**.

Intuitively, this definition means that multiplying $A$ by the vector $x$ results in a new vector that points in the same direction as $x$, but is scaled by a factor $\lambda$.

Also note that for any **eigenvector** $x \in \mathbb{C}^n$, and scalar $t \in \mathbb{C}, A(cx) = cAx = c\lambda x = \lambda(cx)$, so $cx$ is also an **eigenvector**. For this reason when we talk about **“the” eigenvector** associated with $\lambda$, we usually assume that the **eigenvector** is normalized to have length $1$ (this still creates some ambiguity, since $x$ and $−x$ will both be **eigenvectors**, but we will have to live with this).

For any $\lambda$ an eigenvalue of $A$ there is a vector space, the eigenspace, that corresponds to $\lambda$:

$\{ x : A x = \lambda x \}$

Any non-zero vector in this space is an eigenvector. One convenient requirement is that the eigenvector has norm $1$.

Coming back to the account example, the eigenvalues of the matrix $\begin{bmatrix} 1.05 & 0 \\[0.3em] 0 & 1.03 \end{bmatrix}$ are $1.05$ and $1.03$.

The eigenvector for the eigenvalue $1.05$ is $\begin{bmatrix} 1 \\ 0 \end{bmatrix}$. The eigenvector for the eigenvalue $1.03$ is $\begin{bmatrix} 0 \\ 1 \end{bmatrix}$.

We can rewrite the equation above to state that $(\lambda, x)$ is an eigenvalue-eigenvector pair of $A$ if:

$(\lambda I − A)x = 0, x \neq 0$

But $(\lambda I − A)x = 0$ has a non-zero solution to $x$ if and only if $(\lambda I − A)$ has a non-empty nullspace, which is only the case if $(\lambda I − A)$ is singular:

$|(\lambda I − A)| = 0$

We can now use the previous definition of the determinant to expand this expression into a (very large) polynomial in $\lambda$, where $\lambda$ will have maximum degree $n$.

We then find the $n$ (possibly complex) roots of this polynomial to find the $n$ eigenvalues $\lambda_1, \dots{}, \lambda_n$. To find the eigenvector corresponding to the eigenvalue $\lambda_i$, we simply solve the linear equation $(\lambda_iI − A)x = 0$. It should be noted that this is not the method which is actually used in practice to numerically compute the eigenvalues and eigenvectors (remember that the complete expansion of the determinant has $n!$ terms); it is rather a mathematical argument.

The following are properties of eigenvalues and eigenvectors (in all cases assume $A \in \mathbb{R}^{n\times n}$ has eigenvalues $\lambda_i, \dots{}, \lambda_n$ and associated eigenvectors $x_1, \dots{}, x_n$):

- The trace of a $A$ is equal to the sum of its eigenvalues,

$\mathrm{tr}A = \sum_{i=1}^n \lambda_i$ - The determinant of $A$ is equal to the product of its eigenvalues,

$|A| = \prod_{i=1}^n \lambda_i$ - The rank of $A$ is equal to the number of non-zero eigenvalues of $A$
- If $A$ is non-singular then $1/\lambda_i$ is an eigenvalue of $A^{−1}$ with associated eigenvector $x_i$, i.e., $A^{−1}x_i = (1/\lambda_i)x_i$. (To prove this, take the eigenvector equation, $Ax_i = \lambda_i x_i$ and left-multiply each side by $A^{−1}$.)
- The eigenvalues of a diagonal matrix $D = \mathrm{diag}(d_1, \dots{}, d_n)$ are just the diagonal entries $d_1, \dots{}, d_n$.

We can write all the eigenvector equations simultaneously as:

$A X = X \Lambda$

where the columns of $X \in \mathbb{R}^{n\times n}$ are the eigenvectors of $A$ and $\Lambda$ is a diagonal matrix whose entries are the eigenvalues of $A$:

$X \in \mathbb{R}^{n\times n} = \begin{bmatrix} \big| & \big| & & \big| \\[0.3em] x_1 & x_2 & \cdots & x_n \\[0.3em] \big| & \big| & & \big| \end{bmatrix} , \Lambda = \mathrm{diag}(\lambda_1, \dots{}, \lambda_n)$

If the eigenvectors of $A$ are linearly independent, then the matrix $X$ will be invertible, so $A = X \Lambda X^{−1}$. A matrix that can be written in this form is called **diagonalizable**.

We can compute the eigenvalues and eigenvectors in numpy in the following way:

In [12]:

```
A = np.array([[ 2., 1. ],
[ 1., 2. ]])
A
```

Out[12]:

In [13]:

```
from numpy import linalg as lg
l = lg.eigvals(A)
print(l)
```

We could now compute the eigenvector for each eigenvalue. See for an example the HMC Mathematics Online Tutorial on Eigenvalues and Eigenvestors.

Two remarkable properties come about when we look at the eigenvalues and eigenvectors of a symmetric matrix $A \in \mathbb{S}^n$.

- it can be shown that all the eigenvalues of $A$ are real
- the eigenvectors of $A$ are orthonormal, i.e., the matrix $X$ defined above is an orthogonal matrix (for this reason, we denote the matrix of eigenvectors as $U$ in this case).

We can therefore represent $A$ as $A = U \Lambda U^T$, remembering from above that the inverse of an orthogonal matrix is just its transpose.

Using this, we can show that the definiteness of a matrix depends entirely on the sign of its eigenvalues. Suppose $A \in \mathbb{S}^n = U \Lambda U^T$. Then:

$x^T A x = x^T U \Lambda U^T x = y^T \Lambda y = \sum_{i=1}^n \lambda_i y^2_i$

where $y = U^T x$ (and since $U$ is full rank, any vector $y \in \mathbb{R}^n$ can be represented in this form).

Because $y^2_i$ is always positive, the sign of this expression depends entirely on the $\lambda_i$'s. If all $\lambda_i > 0$, then the matrix is positive definite; if all $\lambda_i \leq 0$, it is positive semidefinite. Likewise, if all $\lambda_i < 0$ or $\lambda_i \leq 0$, then $A$ is negative definite or negative semidefinite respectively. Finally, if $A$ has both positive and negative eigenvalues, it is indefinite.

An application where eigenvalues and eigenvectors come up frequently is in maximizing some function of a matrix. In particular, for a matrix $A \in \mathbb{S}^n$, consider the following maximization problem,

$\mathrm{max}_{x\in \mathbb{R}^n} x^T A x \mbox{ subject to } \|x\|^2_2 = 1$

i.e., we want to find the vector (of norm $1$) which maximizes the quadratic form. Assuming the eigenvalues are ordered as $\lambda_1 \geq \lambda_2 \geq \dots{} \geq \lambda_n$, the optimal $x$ for this optimization problem is $x_1$, the eigenvector corresponding to $\lambda_1$. In this case the maximal value of the quadratic form is $\lambda_1$. Similarly, the optimal solution to the minimization problem,

$\mathrm{min}_{x\in \mathbb{R}^n} x^T A x \mbox{ subject to } \|x\|^2_2 = 1$

is $x_n$, the eigenvector corresponding to $\lambda_n$, and the minimal value is $\lambda_n$. This can be proved by appealing to the eigenvector-eigenvalue form of $A$ and the properties of orthogonal matrices. However, in the next section we will see a way of showing it directly using matrix calculus.

...

**(C) 2018-2019 by Damir Cavar**