Matrices

Matrices are a source of many important examples of rings and fields, so we’re going to get a bit more concrete in this chapter and talk about matrices for a bit. You may already be aware that matrices have many applications in computing; two examples that spring to my mind are computer graphics and machine learning.

Vectors

We begin by talking about vectors in \(\mathbb{R}^n\); if you haven’t seen this before, an element of \(\mathbb{R}^n\) is an ordered collection of \(n\) elements of \(\mathbb{R}\). We usually write vectors in a column, and it’s also conventional to use bold symbols for vectors (to help distinguish them from scalars, which are elements of \(\mathbb{R}\)). For example:

\[ \begin{align}\begin{aligned}\begin{split}\boldsymbol{x} = \begin{bmatrix}1\\0\end{bmatrix}\end{split}\\\begin{split}\boldsymbol{y} = \begin{bmatrix}4\\-2\end{bmatrix}\end{split}\\\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^2\end{aligned}\end{align} \]

Sometimes it’s helpful to be able to write vectors on one line, and we do so by listing the components in parentheses, separated by commas. For example, \(\boldsymbol{x} = (1, 0)\).

We define addition for \(\mathbb{R}^n\) by adding corresponding components:

\[\begin{split}\boldsymbol{x} + \boldsymbol{y} &= \begin{bmatrix}1\\0\end{bmatrix} + \begin{bmatrix}4\\-2\end{bmatrix} \\ &= \begin{bmatrix}1+4\\0+(-2)\end{bmatrix} \\ &= \begin{bmatrix}5\\-2\end{bmatrix}\end{split}\]

The identity element of vector addition is the zero vector; the vector which has a zero for every component. This is quite an important vector so we have a short-hand notation for it, which is a bold zero:

\[\begin{split}\boldsymbol{0} = \begin{bmatrix}0\\0\end{bmatrix}\end{split}\]

We can also multiply every component of a vector by some fixed number. This operation is called scalar multiplication:

\[\begin{split}3\boldsymbol{x} &= \begin{bmatrix}3 \times 1\\3 \times 0\end{bmatrix} \\ &= \begin{bmatrix}3\\0\end{bmatrix}\end{split}\]

Exercise 5.1. We have seen that \(\mathbb{R}^2\) is closed under vector addition (that is, adding two vectors always gives you another vector), and also that there is an identity element for vector addition in \(\mathbb{R}^2\). Now, show that \((\mathbb{R}^2, +)\) is a monoid by checking the remaining monoid law (associativity).

Exercise 5.2. Show that \((\mathbb{R}^2, +)\) is a group by explaining how to find the inverse of an element.

Note

\((\mathbb{R}^n, +)\) is actually a group for any \(n\), not just \(n = 2\). We will spend the majority of this chapter working with \(\mathbb{R}^2\), but everything we’re doing generalises very naturally to larger choices of \(n\).

Exercise 5.3. Show that scalar multiplication distributes over vector addition in \(\mathbb{R}^2\); that is, \(\forall \boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^2, k \in \mathbb{R}.\; k(\boldsymbol{x} + \boldsymbol{y}) = k\boldsymbol{x} + k\boldsymbol{y}\).

There is one more vector operation we need, called the dot product. It takes two vectors in \(\mathbb{R}^n\) and produces as a result a single element of \(\mathbb{R}\), by multiplying corresponding components together and then adding all of the results:

\[\begin{split}\boldsymbol{x} \cdot \boldsymbol{y} &= \begin{bmatrix}1\\0\end{bmatrix} \cdot \begin{bmatrix}4\\-2\end{bmatrix} \\ &= (1 \times 4) + (0 \times -2) \\ &= 4 + 0 \\ &= 4\end{split}\]

The dot product is sometimes also called the scalar product because the result is a scalar.

The dot product interacts nicely with the other two vector operations; the following are true for any vectors \(\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{z} \in \mathbb{R}^2\) and scalars \(k_1, k_2 \in \mathbb{R}\):

\[ \begin{align}\begin{aligned}\boldsymbol{x} \cdot (\boldsymbol{y} + \boldsymbol{z}) = \boldsymbol{x} \cdot \boldsymbol{y} + \boldsymbol{x} \cdot \boldsymbol{z}\\(k_1 \boldsymbol{x}) \cdot (k_2 \boldsymbol{y}) = k_1 k_2 (\boldsymbol{x} \cdot \boldsymbol{y})\end{aligned}\end{align} \]

We sometimes need to be a bit careful about keeping track of which operations are which; in particular, note that in the first equation, we have vector addition on the left hand side, but scalar addition on the right.

Exercise 5.4. Prove these two identities regarding the interaction of the dot product with vector addition and scalar multiplication respectively. Hint: we already know that multiplication distributes over addition in \(\mathbb{R}\); that is, \(\forall x, y, z \in \mathbb{R}.\; x(y + z) = xy + xz\).

Linear mappings

Let \(f\) be a function from \(\mathbb{R}^2\) to \(\mathbb{R}^2\), such that the following two laws are satisfied for all \(\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^2, k \in \mathbb{R}\):

\[ \begin{align}\begin{aligned}f(\boldsymbol{x} + \boldsymbol{y}) = f(\boldsymbol{x}) + f(\boldsymbol{y})\\f(k \boldsymbol{x}) = k f(\boldsymbol{x})\end{aligned}\end{align} \]

That is, if we have a pair of vectors and a function \(f\) defined as above, we can add the vectors together and then apply \(f\), or we can apply \(f\) to each of the vectors individually and then add the results together, but in both cases we will always get the same result. Similarly if we have a vector and a scalar, we can multiply the vector by the scalar and then apply \(f\), or apply \(f\) to the vector first and then do the scalar multiplication on the result, but either way the we end up with the same vector.

Functions of this kind are important enough that we have a name for them: linear mappings.

Here is one example of a linear mapping:

\[\begin{split}f(\begin{bmatrix}x_1\\x_2\end{bmatrix}) = \begin{bmatrix} 2x_1 + 3x_2 \\ x_1 - 2x_2 \end{bmatrix}\end{split}\]

Try choosing a couple of vectors in \(\mathbb{R}^2\) and checking that the linear mapping laws are satisfied with those vectors.

Here is an example of a function which fails to be a linear mapping:

\[\begin{split}f(\begin{bmatrix}x_1\\x_2\end{bmatrix}) = \begin{bmatrix} x_1^2 \\ x_2 \end{bmatrix}\end{split}\]

For example, if we take \(\boldsymbol{x} = (2, 0)\) and \(k = 3\), then

\[\begin{split}f(k \boldsymbol{x}) = f(3 \begin{bmatrix}2\\0\end{bmatrix}) = f(\begin{bmatrix}6\\0\end{bmatrix}) = \begin{bmatrix}36\\0\end{bmatrix}\end{split}\]

However, if we apply the function first and then do the scalar multiplication, we get a different result:

\[\begin{split}k f(\boldsymbol{x}) = 3 f(\begin{bmatrix}2\\0\end{bmatrix}) = 3 \begin{bmatrix}4\\0\end{bmatrix} = \begin{bmatrix}12\\0\end{bmatrix}\end{split}\]

Describing linear mappings with dot products

Now, suppose we have 2 vectors \(\boldsymbol{a_1}, \boldsymbol{a_2}, \in \mathbb{R}^2\). We can use these to define a function which maps vectors in \(\mathbb{R}^2\) to vectors in \(\mathbb{R}^2\) like this:

\[\begin{split}\boldsymbol{x} \mapsto \begin{bmatrix} \boldsymbol{a_1} \cdot \boldsymbol{x} \\ \boldsymbol{a_2} \cdot \boldsymbol{x} \\ \end{bmatrix}\end{split}\]

That is, we produce a new vector where the first component is the dot product of \(\boldsymbol{a_1}\) with the parameter \(\boldsymbol{x}\), and the second component is the dot product of \(\boldsymbol{a_2}\) with \(\boldsymbol{x}\).

For example, let us take the following vectors for \(\boldsymbol{a_1}\) and \(\boldsymbol{a_2}\):

\[ \begin{align}\begin{aligned}\begin{split}\boldsymbol{a_1} = \begin{bmatrix}1\\0\end{bmatrix}\end{split}\\\begin{split}\boldsymbol{a_2} = \begin{bmatrix}4\\-2\end{bmatrix}\end{split}\end{aligned}\end{align} \]

We can now define a function using them:

\[\begin{split}\begin{bmatrix}x_1\\x_2\end{bmatrix} \mapsto \begin{bmatrix} 1x_1 + 0x_2 \\ 4x_1 - 2x_2 \\ \end{bmatrix} = \begin{bmatrix} x_1 \\ 4x_1 - 2x_2 \end{bmatrix}\end{split}\]

This particular function takes \((1,1)\) to \((1,2)\), and it takes \((2,0)\) to \((2, 8)\) — check this!

It turns out that functions which can be defined in terms of dot products like this are precisely linear mappings — that is, if you define a function in terms of dot products in this way, it will always be a linear mapping, and conversely, any linear mapping can be described in terms of dot products like we have just done here.

Exercise 5.5. Show that any function defined in terms of dot products will be a linear mapping, using previously given properties of the dot product.

Exercise 5.6. Show that the composition of two linear mappings is itself a linear mapping. That is, if \(f\) and \(g\) are linear mappings, then the function \(f \circ g\), which is defined as \(\boldsymbol{x} \mapsto f(g(\boldsymbol{x}))\), is itself a linear mapping.

Representation of linear mappings as matrices

An \(m \times n\) matrix (read: “\(m\) by \(n\)”) is a rectangular array of things — usually numbers, but not always — with \(m\) rows and \(n\) columns. Here is a \(2 \times 2\) matrix:

\[\begin{split}\begin{bmatrix} 1 & 2 \\ 3 & 4 \\ \end{bmatrix}\end{split}\]

We define matrix addition in more or less the same way as vector addition, i.e. adding corresponding components:

\[\begin{split}\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} &= \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} \\ &= \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}\end{split}\]

Again, there is a zero matrix which is the identity for matrix addition, and it is also written \(\boldsymbol{0}\). This overloaded notation doesn’t turn out to be too much of a problem in practice, as it’s usually clear from context which is meant.

As you might expect, for any pair of natural numbers \(m, n \in \mathbb{N}\), the set of \(m \times n\) matrices forms an Abelian group under addition. Note that matrices must have the same dimensions if you want to be able to add them together.

We represent a linear mapping from \(\mathbb{R}^2\) to \(\mathbb{R}^2\) as a matrix by taking the vectors \(\boldsymbol{a_1}\) and \(\boldsymbol{a_2}\) which we used to define the linear mapping and putting each of them in the corresponding row of the matrix. So components of \(\boldsymbol{a_1}\) become the first row and components of \(\boldsymbol{a_2}\) become the second row. Here is the matrix representation of the example linear mapping which we saw just a moment ago:

\[\begin{split}\begin{bmatrix} 1 & 0 \\ 4 & -2 \end{bmatrix}\end{split}\]

We can multiply a matrix by a vector by writing them next to each other; this operation corresponds to application of the linear mapping to the vector:

\[\begin{split}\begin{bmatrix} 1 & 0 \\ 4 & -2 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} (1 \times 1) + (0 \times 1) \\ (4 \times 1) + (-2 \times 1) \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}\end{split}\]

We learned a moment ago that linear mappings can always be defined in terms of dot products, and also that functions defined in terms of dot products are linear mappings. Since a matrix is just another way of writing the vectors \(\boldsymbol{a_1}\) and \(\boldsymbol{a_2}\), matrices and linear mappings are in one-to-one correspondence. This is very useful: if we are asked a question about linear mappings which is difficult to answer, we can translate it into an equivalent question about matrices (and vice versa) because of this correspondence. Sometimes, simply by translating a question about linear mappings to one about matrices, we can make the answer immediately obvious, even for questions which originally seemed very difficult.

We can generalise the operation of multiplying a matrix by a vector to allow us to multiply matrices by other matrices. We do this by splitting the matrix on the right hand side into columns, multiplying the matrix on the left by each of these columns individually, and then joining up the resulting vectors so that they form the columns of a new matrix.

For example, suppose we want to multiply these matrices:

\[ \begin{align}\begin{aligned}\begin{split}A = \begin{bmatrix} 1 & 0 \\ 4 & -2 \end{bmatrix}\end{split}\\\begin{split}B = \begin{bmatrix} 1 & 5 \\ 1 & 3 \end{bmatrix}\end{split}\\AB = \;?\end{aligned}\end{align} \]

We start by splitting the right-hand matrix, \(B\), into columns:

\[\begin{split}\begin{bmatrix}1\\1\end{bmatrix} \; \begin{bmatrix}5\\3\end{bmatrix} \;\end{split}\]

Then we multiply each of these by the left-hand matrix \(A\). We already know that the result of multiplying \(A\) by \((1,1)\) is \((1,2)\). The result of multiplying \(A\) by the other column, \((5,3)\), is \((5,14)\) — again, I recommend checking this. Finally we put these columns back together:

\[\begin{split}AB = \begin{bmatrix} 1 & 5 \\ 2 & 14 \end{bmatrix}\end{split}\]

In general, then, a product of \(2 \times 2\) matrices looks like this:

\[\begin{split}\begin{bmatrix} a_1 & b_1 \\ c_1 & d_1 \end{bmatrix} \begin{bmatrix} a_2 & b_2 \\ c_2 & d_2 \end{bmatrix} = \begin{bmatrix} a_1 a_2 + b_1 c_2 & a_1 b_2 + b_1 d_2 \\ c_1 a_2 + d_1 c_2 & c_1 b_2 + d_1 d_2 \end{bmatrix}\end{split}\]

The website http://matrixmultiplication.xyz is an interactive matrix multiplication calculator, which you might like to play around with a bit to get more of a feel for what is going on. I should also add that there are lots of different ways of thinking about matrix multiplication. If what I’ve described makes no sense to you, you might be able to find an alternative way of thinking about it that works better for you with a little googling.

Matrix multiplication turns out to correspond to composition of linear mappings. That is, if the matrix \(A\) represents the linear mapping \(f\), and the matrix \(B\) represents the linear mapping \(g\), then the matrix product \(AB\) represents the linear mapping \(f \circ g\).

Properties of matrix operations

The set of \(n \times n\) matrices under matrix multiplication turns out to be a monoid:

  • The result of multiplying two \(n \times n\) matrices is always a \(n \times n\) matrix.
  • Matrix multiplication is associative; that is, if we have three \(n \times n\) matrices \(A, B, C\), then \((AB)C = A(BC)\).
  • Matrix multiplication has an identity, called the identity matrix. There is an \(n \times n\) identity matrix for every \(n \in \mathbb{N}\); multiplying any matrix by it gives you back the same matrix.

The question of how to prove that matrix multiplication is associative is a very good example of one of those questions it is easy to see the answer to by translating the question into a different one. Although possible, it is extremely tedious to show that matrix multiplication is associative directly. A better approach is to simply say that since matrix multiplication corresponds to composition of linear mappings, and since function composition is associative, matrix multiplication must be associative too.

The \(2 \times 2\) identity matrix looks like this:

\[\begin{split}\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\end{split}\]

You might like to try multiplying it with some other matrices to check that it is indeed the identity for multiplication.

Matrix multiplication also distributes over matrix addition. That is, for \(n \times n\) matrices \(A, B, C,\) we have that

\[ \begin{align}\begin{aligned}A(B+C) = AB + AC\\(A+B)C = AC + BC\end{aligned}\end{align} \]

just like with real numbers. Therefore, we have seen that the three ring laws for the set of \(n \times n\) matrices under matrix addition and matrix multiplication hold, and therefore this set is a ring. We denote the ring of \(n \times n\) matrices with entries in \(\mathbb{R}\) by \(\mathrm{Mat}(n; \mathbb{R})\).

However, unlike real numbers, matrix multiplication is not commutative. In fact I promised to show you a non-commutative ring in the previous chapter; here it is! With matrices, \(AB\) does not always equal \(BA\). For example, if we have

\[ \begin{align}\begin{aligned}\begin{split}A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\end{split}\\\begin{split}B = \begin{bmatrix} 0 & 1 \\ 0 & 1 \end{bmatrix},\end{split}\end{aligned}\end{align} \]

then multiplying one way gives us

\[\begin{split}AB = \begin{bmatrix} 0 & 2 \\ 0 & 1 \end{bmatrix}\end{split}\]

but the other way gives us

\[\begin{split}BA = \begin{bmatrix} 0 & 1 \\ 0 & 1 \end{bmatrix}.\end{split}\]

Since matrices correspond to linear mappings, we can also conclude that linear mappings form a noncommutative ring where the multiplication operation is function composition. What will the addition operation be? (Hint: it’s the linear mapping analogue of matrix addition.)