# Numerical Linear Algebra

The following diagram shows a graph that represents a series of web pages (numbered from 1 to 8).

An arrow from a node to another indicates the existence of a link from the web page represented by the sending node, to the page represented by the receiving node. For example, the arrow from node 2 to node 1 indicates that there is a link in web page 2 pointing to web page 1. Notice how web page 4 has two outer links (to pages 2 and 8), and there are three pages that link to web page 4 (pages 2, 6 and 7). The pages represented by nodes 2, 4 and 8 seem to be the most *popular* at first sight.

Is there a mathematical way to actually express the popularity of a web page within a network? Researchers at Google came up with the idea of a **PageRank** to roughly estimate this concept by counting the number and quality of links to a page. It goes like this:

We construct a *transition matrix* of this graph \( T=\big( a_{i,j} \big) \) in the following fashion: the entry
\( a_{i,j} \) is \( 1/k \) if there is a link from web page \( i \) to web page \( j \), and the total number of outer links in web page \( i \) amounts to \( k \). Otherwise, the entry is just zero. The size of a transition matrix of \( N \) web pages is always \( N \times N \). In our case, the matrix has size \( 8 \times 8 \):

Let us open an `ipython`

session and load this particular matrix to memory. Remember that in `python`

, indices start from zero, not one:

From the transition matrix, we create a *Page Rank matrix*, \( G \) (also known as the **Google matrix**), by fixing a positive constant \( 0 < p \leq 1 \), and following the formula \( G = (1-p) \cdot T + p \cdot B \). Here, \( B \) is a matrix with the same size as \( T \), with all its entries equal to \( 1/N \). For example, if we choose \( p=0.15 \), we obtain the following Google matrix

Google matrices have some interesting properties:

- 1 is an eigenvalue of multiplicity one.
- 1 is actually the largest eigenvalue: all the other eigenvalues are in modulus smaller than 1.
- The eigenvector corresponding to eigenvalue 1 has all entries positive. In particular, for the eigenvalue 1 there exists a unique eigenvector with the sum of its entries equal to one. This is what we call the
**Page Rank vector**.

A quick computation with `scipy.linalg.eig`

finds that eigenvector for us:

1
2
3

eigenvalues, eigenvectors = spla.eig(G)
print eigenvalues

1
2
3
4

PageRank = eigenvectors[:,0]
PageRank /= sum(PageRank)
print PageRank.real

Those values correspond to the Page Ranks of each of the eight web pages depicted on the graph. As expected, the maximum value of those is associated to the second web page (0.232), closely followed by the fourth (0.219), and then the eighth web page (0.157).

Note how this problem of networks of web pages has been translated into mathematical objects, to an equivalent problem involving matrices, eigenvalues and eigenvectors, and has been solved with techniques of Linear Algebra.

The transition matrix is **sparse**: most of its entries are zeros. Sparse matrices with extremely large size are of special importance in *Numerical Linear Algebra*, not only because they encode challenging scientific problems, but also because it is extremely hard to manipulate them with basic algorithms.

Rather than storing to memory all values in the matrix, it makes sense to collect only the non-zero values instead, and use algorithms with exploit these smart storage schemes. The gain in memory management is obvious. These methods are usually faster for this kind of matrices and give less roundoff errors, since there are usually far less operations involved. This is another advantage of `scipy`

, since it contains numerous procedures to attack different problems where data is stored in this fashion. Let us observe its power with another example:

The **University of Florida Sparse Matrix Collection** is the largest database of matrices accessible online. As of January 2014, it contains 157 groups of matrices arising from all sorts of scientific disciplines. The sizes of the matrices range from very small (1-by-2) to insanely large (28-million-by-28-million). More matrices are expected to be added constantly, as they arise in different engineering problems.

More information about this database can be found in ACM Transactions on Mathematical Software, vol 38, Issue 1, 2011, pp 1:1 - 1:25, by T.A. Davis and Y.Hu, or online at www.cise.ufl.edu/research/sparse/matrices

For example, the group with the most matrices in the database is the original Harwell-Boeing Collection, with 292 different sparse matrices. This group can also be accessed online at the **Matrix Market**: math.nist.gov/MatrixMarket

Each matrix in the database comes in three formats:

**Matrix Market Exchange Format**[Boisvert et al. 1997]**Rutherford-Boeing Exchange Format**[Duff et al. 1997]- Proprietary
**matlab**`.mat`

format

Let us import to our `ipython`

session two matrices in Matrix Market Exchange format from the Collection, meant to be used in a solution of a least squares problem. These matrices are located at www.cise.ufl.edu/research/sparse/matrices/Bydder/mri2.html

We download the corresponding `tar`

bundle and `untar`

it to get two `ASCII`

files:

`mri2.mtx`

(the main matrix in the least squares problem)`mri2_b.mtx`

(the right-hand side of the equation)

The first twenty lines of the file mri2.mtx read as follows:

The first sixteen lines are comments, and give us some information about the generation of the matrix.

- The computer vision problem where it arose: An MRI reconstruction.
- Author information: Mark Bydder, UCSD.
- Procedures to apply to the data: Solve a least square problem \( A x - b \), and posterior visualization of the result.

The seventeenth line indicates the size of the matrix: 63240 rows by 147456 columns, as well as the number of non-zero entries in the data: 569160.

The rest of the file includes precisely 569160 lines, each containing two integer numbers, and a floating point number: These are the locations of the non-zero elements in the matrix, together with the corresponding values.

We need to take into account that these files use the

`Fortran`

convention of starting arrays from 1, not from 0.

A good way to read this file into an ndarray is by means of the function `loadtxt`

in `numpy`

. We can then use `scipy`

to transform the array into a sparse matrix with the function `coo_matrix`

in the module `scipy.sparse`

(`coo`

stands for *coordinate* internal format).

1
2
3
4
5

rows, cols, data = np.loadtxt("mri2.mtx", skiprows=17, unpack=True)
rows -= 1
cols -= 1
MRI2 = spsp.coo_matrix((data, (rows, cols)), shape=(63240,147456))

The best way to visualize the sparsity of this matrix is by means of the routine `spy`

from the module `matplotlib.pyplot`

.

These are the first ten lines from the second file, `mri2_b.mtx`

, which does not represent a sparse matrix, but a column vector:

Those are six commented lines with information, one more line indicating the shape of the vector (63240 rows and one column), and the rest of the lines contain two columns of floating point values: the real and imaginary parts of the corresponding data. We proceed to read this vector to memory, solve the least squares problem suggested, and obtain the following reconstruction:

1
2
3

r_vals, i_vals = np.loadtxt("mri2_b.mtx", skiprows=7, unpack=True)
%time solution = spspla.lsqr(MRI2, r_vals + 1j*i_vals)

1
2
3
4
5
6

from scipy.fftpack import fft2, fftshift
img = solution[0].reshape(384,384)
img = np.abs(fftshift(fft2(img)))
plt.figure(figsize=(8,8))
plt.imshow(img)

If interested in the theory behind the creation of this matrix and the particulars of this problem, read the article

On the optimality of the Gridding Reconstruction Algorithm, by H. Sedarat and D. G. Nishimura, published in IEEE Trans. Medical Imaging, vol 19, no 4, pp. 306-317, 2000.

For matrices with a good structure, which are going to be exclusively involved in matrix multiplications, it is often possible to store the objects without allocating almost any space. Consider the following example:

A horizontal earthquake oscillation affects each floor of a tall building, depending on the natural frequencies of oscillation of the floors. If we make certain assumptions, a model to quantize the oscillations on buildings with N floors can be obtained as a second-order system of N differential equations by *competition*: the Newton’s second law force is set equal to the sum of Hooke’s forces, and the external force due to the earthquake wave.

These are the assumptions we need:

- Each floor is considered a point of mass located at its center-of-mass. The floors have masses \( m_1, m_2, …, m_N \).
- Each floor is restored to its equilibrium position by a linear restoring force (Hooke’s \( -k \times \text{elongation} \)). The Hooke’s constants for the floors are \( k_1, k_2, …, k_N \).
- The locations of masses representing the oscillation of the floors are \( x_1, x_2, …, x_N \). We assume all of them functions of time, and that at equilibrium they are all equal to zero.
- For simplicity of exposition, we are going to assume no friction: all damping effects on the floors are ignored.
- The equations of a floor depend only on the neighboring floors.

Set \( M \), the mass matrix, to be a diagonal matrix containing the floor masses on its diagonal. Set \( K \), the Hooke’s matrix, to be a tri-diagonal matrix with the following structure: for each row \( j \), all the entries are zero except

- Column \( j-1 \), which we set to be \( k_{j+1} \),
- Column \( j \), which we set to \( -k_{j+1}-k_j \), and
- Column \( j+1 \), which we set to \( k_{j+2} \).

Set \( H \) to be a column vector containing the external force on each floor due to the earthquake, and \( X \) the column vector containing the functions \( x_j \).

We have then the system \( M \cdot X^{\prime\prime} = K \cdot X + H \). The Homogeneous part of this system is the product of the inverse of \( M \) with \( K \), which we denote \( A \).

To solve the homogeneous linear second-order system \( X^{\prime\prime} = A \cdot X \), we define the variable \( Y \) to contain \( 2N \) entries: all \( N \) functions \( x_j \), followed by their derivatives \( x^\prime_j \). Any solution of this second-order linear system has a corresponding solution on the first-order linear system \( Y^\prime = C \cdot Y \), where \( C \) is a block matrix of size \( 2N \times 2N \). This matrix \( C \) is composed by a block of size \( N \times N \) containing only zeros, followed horizontally by the identity (of size \( N \times N \)), and below these two, the matrix \( A \) followed horizontally by another \( N \times N \) block of zeros.

It is not necessary to store this matrix \( C \) into memory, or any of its factors or blocks. Instead, we will make use of its structure, and use a **Linear Operator** to represent it. Minimal data is then needed to generate this operator (only the values of the masses and the Hooke’s coefficients): much less than any matrix representation of it.

Let us show a concrete example with 6 floors. We indicate first their masses and Hooke’s constants, and proceed to construct a representation of \( A \) as a linear operator:

1
2
3
4
5
6
7
8
9
10
11
12
13
14

m = np.array([56., 56., 56., 54., 54., 53.])
k = np.array([561., 562., 560., 541., 542., 530.])
def Axv(v):
global k, m
w = v.copy()
w[0] = (k[1]*v[1] - (k[0]+k[1])*v[0])/m[0]
for j in range(1, len(v)-1):
w[j] = k[j]*v[j-1] + k[j+1]*v[j+1] - (k[j]+k[j+1])*v[j]
w[j] /= m[j]
w[-1] = k[-1]*(v[-2]-v[-1])/m[-1]
return w
A = spspla.LinearOperator((6,6), matvec=Axv, matmat=Axv, dtype=np.float64)

The construction of \( C \) is very simple now (much simpler than its matrix!):

1
2
3
4
5
6
7
8

def Cxv(v):
n = len(v)/2
w = v.copy()
w[:n] = v[n:]
w[n:] = A * v[:n]
return w
C = spspla.LinearOperator((12,12), matvec=Cxv, matmat=Cxv, dtype=np.float64)

A solution of this homogeneous system comes in the form of an *action of the exponential* of \( C \): \( Y(t) = \operatorname{expm}(C t) \cdot Y(0) \), where \( \operatorname{expm}() \) here denotes a matrix exponential function. In `scipy`

, this operation is performed with the routine `expm_multiply`

in the module `scipy.sparse.linalg`

.

For example, in our case, given the initial value containing the values \( x_1(0)=0, \dots, x_N(0)=0, x^\prime_1(0)=1, \dots, x^\prime_N(0)=1 \), if we require a solution \( Y(t) \) for values of \( t \) between 0 and 1 in steps of size 0.1, we could issue the following:

1
2
3
4

initial_condition = np.zeros(12)
initial_condition[6:] = 1
Y = spspla.expm_multiply(C.matmat(np.eye(12)), np.ones(12), start=0, stop=1, num=10)

It has been reported in some installations that, in the next step, a matrix for \( C \) must be given instead of the actual linear operator (thus contradicting the manual). If this is the case in your system, simply change \( C \) in the next lines, to its matrix representation.

The oscillations of the six floors during the first second can then be calculated and plotted as follows.

1
2
3
4
5

plt.figure(figsize=(8,8))
plt.plot(np.linspace(0,1,10), Y[:,:6])
plt.xlabel('time (in seconds)')
plt.ylabel('oscillation')
plt.legend(np.arange(6)+1, loc=2)

For more details about systems of differential equations, and how to solve them with actions of exponentials, read for example the excellent book

Elementary Differential Equations10 ed., by William E. Boyce and Richard C. DiPrima. Wiley, 2012.

These three examples illustrate the goal of this first chapter: **Numerical Linear Algebra**. In `python`

this is accomplished first by storing the data in matrix form, or as a related linear operator, by means of any of the following classes:

`numpy.ndarray`

(making sure that they are two-dimensional)`numpy.matrix`

`scipy.sparse.bsr_matrix`

(Block Sparse Row matrix)`scipy.sparse.coo_matrix`

(Sparse Matrix in**COO**rdinate format)`scipy.sparse.csc_matrix`

(Compressed Sparse Column matrix)`scipy.sparse.csr_matrix`

(Compressed Sparse Row matrix)`scipy.sparse.dia_matrix`

(Sparse matrix with**DIA**gonal storage)`scipy.sparse.dok_matrix`

(Sparse matrix based on a Dictionary of Keys)`scipy.sparse.lil_matrix`

(Sparse matrix based on a linked list)`scipy.sparse.linalg.LinearOperator`

As we have seen in the examples, the choice of different classes obeys mainly to the sparsity of the data, and the algorithms we are to apply on them.

This choice then dictates the modules that we use for the different algorithms: `scipy.linalg`

for generic matrices, and both `scipy.sparse`

, `scipy.sparse.linalg`

, for sparse matrices or linear operators. These three `scipy`

modules are compiled on top of the highly optimized computer libraries `BLAS`

(written in `Fortran77`

), `LAPACK`

(in `Fortran90`

), `ARPACK`

(in `Fortran77`

), and `SuperLU`

(in `C`

).

For a better understanding of these underlying packages, read the description and documentation from their creators:

`BLAS`

: netlib.org/blas/faq.html`LAPACK`

: netlib.org/lapack/lapack-3.2.html`ARPACK`

: www.caam.rice.edu/software/ARPACK/`SuperLU`

: crd-legacy.lbl.gov/~xiaoye/SuperLU/

Most of the routines in these three `scipy`

modules are wrappers to functions in the mentioned libraries. If we so desire, we also have the possibility to call the underlying functions directly. In the module `scipy.linalg`

, we have

`scipy.linalg.get_blas_funcs`

to call routines from`BLAS`

.`scipy.linalg.get_lapack_funcs`

to call routines from`LAPACK`

.

For example, if we want to use the `BLAS`

function `NRM2`

to compute Frobenius norms:

1
2
3

blas_norm = spla.get_blas_funcs('nrm2')
blas_norm(np.float32([1e20]))

## Notebook

For an `ipython`

notebook with the current draft of chapter 1 (Numerical Linear Algebra) of my upcoming book *Mastering Scipy*, follow [this link] to the repository Mastering Scipy in my github.