# Matrices¶

Matrices are a source of many important examples of rings and fields, so we’re going to get a bit more concrete in this chapter and talk about matrices for a bit. You may already be aware that matrices have many applications in computing; two examples that spring to my mind are 3D graphics and machine learning.

## Vectors¶

We begin by talking about *vectors* in \(\mathbb{R}^n\); if you haven’t
seen this before, an element of \(\mathbb{R}^n\) is an ordered collection
of \(n\) elements of \(\mathbb{R}\). We usually write vectors in a
column, and it’s also conventional to use bold symbols for vectors (to help
distinguish them from *scalars*, which are elements of \(\mathbb{R}\)). For
example:

Sometimes it’s helpful to be able to write vectors on one line, and we do so by listing the components in parentheses, separated by commas. For example, \(\boldsymbol{x} = (1, 0)\).

We define addition for \(\mathbb{R}^n\) by adding corresponding components:

The identity element of vector addition is the *zero vector;* the vector which
has a zero for every component. This is quite an important vector so we have a
short-hand notation for it, which is a bold zero:

We can also multiply every component of a vector by some fixed number. This
operation is called *scalar multiplication:*

**Exercise 5.1.** We have seen that \(\mathbb{R}^2\) is closed under vector
addition (that is, adding two vectors always gives you another vector), and
also that there is an identity element for vector addition in
\(\mathbb{R}^2\). Now, show that \((\mathbb{R}^2, +)\) is a monoid
by checking the remaining monoid law (associativity).

**Exercise 5.2.** Show that \((\mathbb{R}^2, +)\) is a group by explaining
how to find the inverse of an element.

Note

\((\mathbb{R}^n, +)\) is actually a group for any \(n\), not just \(n = 2\). We will spend the majority of this chapter working with \(\mathbb{R}^2\), but everything we’re doing generalises very naturally to different choices of \(n\).

**Exercise 5.3.** Show that scalar multiplication distributes over vector
addition in \(\mathbb{R}^2\); that is, \(\forall \boldsymbol{x},
\boldsymbol{y} \in \mathbb{R}^2, k \in \mathbb{R}.\; k(\boldsymbol{x} +
\boldsymbol{y}) = k\boldsymbol{x} + k\boldsymbol{y}\).

There is one more vector operation we need, called the *dot product*. It
takes two vectors in \(\mathbb{R}^n\) and produces as a result a single
element of \(\mathbb{R}\), by multiplying corresponding components together
and then adding all of the results:

The dot product is sometimes also called the *scalar product* because the
result is a scalar.

The dot product interacts nicely with the other two vector operations; the following are true for any vectors \(\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{z} \in \mathbb{R}^2\) and scalars \(k_1, k_2 \in \mathbb{R}\):

We sometimes need to be a bit careful about keeping track of which operations
are which; in particular, note that in the first equation, we have *vector*
addition on the left hand side, but *scalar* addition on the right.

**Exercise 5.4.** Prove these two identities regarding the interaction of the
dot product with vector addition and scalar multiplication respectively. Hint:
we already know that multiplication distributes over addition in
\(\mathbb{R}\); that is, \(\forall x, y, z \in \mathbb{R}.\; x(y + z)
= xy + xz\).

## Linear mappings¶

Let \(f\) be a function from \(\mathbb{R}^2\) to \(\mathbb{R}^2\), such that the following two laws are satisfied for all \(\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^2, k \in \mathbb{R}\):

That is, if we have a pair of vectors and a function \(f\) defined as above, we can add the vectors together and then apply \(f\), or we can apply \(f\) to each of the vectors individually and then add the results together, but in both cases we will always get the same result. Similarly if we have a vector and a scalar, we can multiply the vector by the scalar and then apply \(f\), or apply \(f\) to the vector first and then do the scalar multiplication on the result, but either way the we end up with the same vector.

Functions of this kind are important enough that we have a name for them:
*linear mappings*.

Here is one example of a linear mapping:

Try choosing a couple of vectors in \(\mathbb{R}^2\) and checking that the linear mapping laws are satisfied with those vectors.

Here is an example of a function which fails to be a linear mapping:

For example, if we take \(\boldsymbol{x} = (2, 0)\) and \(k = 3\), then

However, if we apply the function first and then do the scalar multiplication, we get a different result:

### Describing linear mappings with dot products¶

Now, suppose we have 2 vectors \(\boldsymbol{a_1}, \boldsymbol{a_2}, \in \mathbb{R}^2\). We can use these to define a function which maps vectors in \(\mathbb{R}^2\) to vectors in \(\mathbb{R}^2\) like this:

That is, we produce a new vector where the first component is the dot product of \(\boldsymbol{a_1}\) with the parameter \(\boldsymbol{x}\), and the second component is the dot product of \(\boldsymbol{a_2}\) with \(\boldsymbol{x}\).

For example, let us take the following vectors for \(\boldsymbol{a_1}\) and \(\boldsymbol{a_2}\):

We can now define a function using them:

This particular function takes \((1,1)\) to \((1,2)\), and it takes \((2,0)\) to \((2, 8)\) — check this!

It turns out that functions which can be defined in terms of dot products like this are precisely linear mappings — that is, if you define a function in terms of dot products in this way, it will always be a linear mapping, and conversely, any linear mapping can be described in terms of dot products like we have just done here.

**Exercise 5.5.** Show that any function defined in terms of dot products will
be a linear mapping, using previously given properties of the dot product.

**Exercise 5.6.** Show that the composition of two linear mappings is itself
a linear mapping. That is, if \(f\) and \(g\) are linear mappings, then
the function \(f \circ g\), which is defined as \(\boldsymbol{x}
\mapsto f(g(\boldsymbol{x}))\), is itself a linear mapping.

## Representation of linear mappings as matrices¶

An \(m \times n\) matrix (read: “\(m\) by \(n\)”) is a rectangular array of things — usually numbers, but not always — with \(m\) rows and \(n\) columns. Here is a \(2 \times 2\) matrix:

We define matrix addition in more or less the same way as vector addition, i.e. adding corresponding components:

Again, there is a zero matrix which is the identity for matrix addition, and it is also written \(\boldsymbol{0}\). This overloaded notation doesn’t turn out to be too much of a problem in practice, as it’s usually clear from context which is meant.

As you might expect, for any pair of natural numbers \(m, n \in \mathbb{N}\), the set of \(m \times n\) matrices forms an Abelian group under addition. Note that matrices must have the same dimensions if you want to be able to add them together.

We represent a linear mapping from \(\mathbb{R}^2\) to \(\mathbb{R}^2\) as a matrix by taking the vectors \(\boldsymbol{a_1}\) and \(\boldsymbol{a_2}\) which we used to define the linear mapping and putting each of them in the corresponding row of the matrix. So components of \(\boldsymbol{a_1}\) become the first row and components of \(\boldsymbol{a_2}\) become the second row. Here is the matrix representation of the example linear mapping which we saw just a moment ago:

We can multiply a matrix by a vector by writing them next to each other; this
operation corresponds to *application* of the linear mapping to the vector:

We learned a moment ago that linear mappings can always be defined in terms of dot products, and also that functions defined in terms of dot products are linear mappings. Since a matrix is just another way of writing the vectors \(\boldsymbol{a_1}\) and \(\boldsymbol{a_2}\), matrices and linear mappings are in one-to-one correspondence. This is very useful: if we are asked a question about linear mappings which is difficult to answer, we can translate it into an equivalent question about matrices (and vice versa) because of this correspondence. Sometimes, simply by translating a question about linear mappings to one about matrices, we can make the answer immediately obvious, even for questions which originally seemed very difficult.

We can generalise the operation of multiplying a matrix by a vector to allow us to multiply matrices by other matrices. We do this by splitting the matrix on the right hand side into columns, multiplying the matrix on the left by each of these columns individually, and then joining up the resulting vectors so that they form the columns of a new matrix.

For example, suppose we want to multiply these matrices:

We start by splitting the right-hand matrix, \(B\), into columns:

Then we multiply each of these by the left-hand matrix \(A\). We already know that the result of multiplying \(A\) by \((1,1)\) is \((1,2)\). The result of multiplying \(A\) by the other column, \((5,3)\), is \((5,6)\) — again, I recommend checking this. Finally we put these columns back together:

In general, then, a product of \(2 \times 2\) matrices looks like this:

The website http://matrixmultiplication.xyz is an interactive matrix multiplication calculator, which you might like to play around with a bit to get more of a feel for what is going on. I should also add that there are lots of different ways of thinking about matrix multiplication. If what I’ve described makes no sense to you, you might be able to find an alternative way of thinking about it that works better for you with a little googling.

Matrix multiplication turns out to correspond to *composition* of linear
mappings. That is, if the matrix \(A\) represents the linear mapping
\(f\), and the matrix \(B\) represents the linear mapping \(g\),
then the matrix product \(AB\) represents the linear mapping \(f \circ
g\).

## Properties of matrix operations¶

The set of \(n \times n\) matrices under matrix multiplication turns out to be a monoid:

- The result of multiplying two \(n \times n\) matrices is always a \(n \times n\) matrix.
- Matrix multiplication is associative; that is, if we have three \(n \times n\) matrices \(A, B, C\), then \((AB)C = A(BC)\).
- Matrix multiplication has an identity, called the
*identity matrix*. There is an \(n \times n\) identity matrix for every \(n \in \mathbb{N}\); multiplying any matrix by it gives you back the same matrix.

The question of how to prove that matrix multiplication is associative is a very good example of one of those questions it is easy to see the answer to by translating the question into a different one. Although possible, it is extremely tedious to show that matrix multiplication is associative directly. A better approach is to simply say that since matrix multiplication corresponds to composition of linear mappings, and since function composition is associative, matrix multiplication must be associative too.

The \(2 \times 2\) identity matrix looks like this:

You might like to try multiplying it with some other matrices to check that it is indeed the identity for multiplication.

Matrix multiplication also distributes over matrix addition. That is, for \(n \times n\) matrices \(A, B, C,\) we have that

just like with real numbers. Therefore, we have seen that the three ring laws for the set of \(n \times n\) matrices under matrix addition and matrix multiplication hold, and therefore this set is a ring. We denote the ring of \(n \times n\) matrices with entries in \(\mathbb{R}\) by \(\mathrm{Mat}(n; \mathbb{R})\).

However, unlike real numbers, matrix multiplication is not commutative. In fact I promised to show you a non-commutative ring in the previous chapter; here it is! With matrices, \(AB\) does not always equal \(BA\). For example, if we have

then multiplying one way gives us

but the other way gives us

Since matrices correspond to linear mappings, we can also conclude that linear mappings form a noncommutative ring where the multiplication operation is function composition. What will the addition operation be? (Hint: it’s the linear mapping analogue of matrix addition.)