4 Matrix Arithmetic

We are all familiar with scalar arithmetic: addition, subtraction, multiplication, and properties such as associativity and distributivity. In the following sections we extend these operations to vectors and matrices, building the algebraic machinery used throughout scientific computing.

4.1 Addition and Subtraction

Matrix addition generalizes scalar addition by operating entry-wise. The key constraint is dimensional compatibility: two matrices can be added only when they share the same shape.

Definition: Matrix Sum

Let $\bA, \bB \in \fR^{m \times n}$. Their sum is defined as \[\bA + \bB = \printmatsq{a} + \printmatsq{b} = \printmatsq{(a+b)},\] where each entry satisfies $(a+b)_{ij} = a_{ij} + b_{ij}$ for $1 \le i \le m$ and $1 \le j \le n$.

Tip

For $\bA \in \fR^{m \times n}$ and $\bB \in \fR^{p \times q}$, the sum $\bA + \bB$ is defined if and only if $m = p$ and $n = q$.

Exercise

Refer to the result above.

Let $\bA = \begin{pmatrix}1 & 2 \\ 3 & 4\end{pmatrix}$ and $\bB = \begin{pmatrix}5 & 6 \\ 7 & 8\end{pmatrix}$. Compute $\bA + \bB$ and $2\bA - \bB$ by hand.
In NumPy, verify your answers using np.array. Then check whether np.array([1,2,3]) $+$ np.array([1,2]) raises an error and explain why.

Commutativity and Associativity of Matrix Addition$$2mm] Let \[ \bA = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \quad \text{and} \quad \bB = \begin{pmatrix} e & f \\ g & h \end{pmatrix}.$$ Then, \[\bA + \bB = \begin{pmatrix} a+e & b+f \\ c+g & d+h \end{pmatrix} \quad \text{and} \quad \bB + \bA = \begin{pmatrix} e+a & f+b \\ g+c & h+d \end{pmatrix}.\] Since addition of real numbers is commutative, $\bA + \bB = \bB + \bA$. Furthermore, for any third matrix $\mathbf{C} = \begin{pmatrix} i & j \\ k & l \end{pmatrix}$, \[(\bA+\bB)+\mathbf{C} = \begin{pmatrix} (a+e)+i & (b+f)+j \\ (c+g)+k & (d+h)+l \end{pmatrix},\] and \[\bA+(\bB+\mathbf{C}) = \begin{pmatrix} a+(e+i) & b+(f+j) \\ c+(g+k) & d+(h+l) \end{pmatrix}.\] By associativity of real addition, these are identical. Hence matrix addition is both commutative and associative.

Theorem: $\fR^{m \times n}$ as a Vector Space

The set $\fR^{m \times n}$ equipped with matrix addition and scalar multiplication forms a real vector space of dimension $mn$.

Proof

Each axiom follows from the corresponding property of $\fR$ applied entry-wise. For commutativity: \[(\bA + \bB)_{ij} = a_{ij} + b_{ij} = b_{ij} + a_{ij} = (\bB + \bA)_{ij}.\] The additive identity is $\mathbf{0}$ since $(\bA + \mathbf{0})_{ij} = a_{ij}$. The additive inverse of $\bA$ is $-\bA$ (negate every entry), satisfying $(\bA + (-\bA))_{ij} = 0$.

For the dimension: the $mn$ matrices $\bE^{(ij)}$ with a $1$ in position $(i,j)$ and zeros elsewhere are linearly independent, and every $\bA \in \fR^{m \times n}$ decomposes as $\bA = \sum_{i,j} a_{ij}\,\bE^{(ij)}$, so they form a basis of size $mn$.

Definition: Scalar Multiplication

Let $\bA \in \fR^{m \times n}$ and $c \in \fR$. Then scalar multiplication is defined as \[c\bA = c \printmatsq{a} = \printmatsq{c a},\] where each entry satisfies $(ca)_{ij} = c \cdot a_{ij}$.

Distributivity of Scalar Multiplication over Matrix Addition$$2mm] Let $\alpha \in \fR$ and [ = \[\begin{pmatrix} a & b \\ c & d \end{pmatrix}\] , = \[\begin{pmatrix} e & f \\ g & h \end{pmatrix}\] .\[ Then, \](+ ) = \[\begin{pmatrix} a+e & b+f \\ c+g & d+h \end{pmatrix}\] = \[\begin{pmatrix} \alpha(a+e) & \alpha(b+f) \\ \alpha(c+g) & \alpha(d+h) \end{pmatrix}\] .\[ Since scalar multiplication distributes over addition for real numbers, this equals \] \[\begin{pmatrix} \alpha a+\alpha e & \alpha b+\alpha f \\ \alpha c+\alpha g & \alpha d+\alpha h \end{pmatrix}\]

= + .$$

Exercise

Refer to the result above and the result above.

Compute $3\bA - 2\bB$ for $\bA = \begin{pmatrix}1 & 0 \\ -1 & 2\end{pmatrix}$ and $\bB = \begin{pmatrix}2 & 1 \\ 0 & -1\end{pmatrix}$.
The standard basis for $\fR^{2 \times 2}$ consists of four matrices $\bE^{(ij)}$. Write them out explicitly and verify that $\bA = \sum_{i,j} a_{ij}\bE^{(ij)}$ holds for your $\bA$ above.

4.2 Dot Product

Before defining the general matrix product, we introduce the dot product of two vectors. The matrix product will then be understood entry-wise as a collection of dot products.

Definition: Dot Product

% For vectors $\ba, \bb \in \fR^{n}$, the dot product (or inner product) is defined as \[ \ba \cdot \bb = \ba^T \bb = \printrowvec{a} \printcolvec{b} = \sum_{i=1}^{n} a_{i} b_{i}. \]

Exercise

Refer to the result above.

Compute $\ba \cdot \bb$ for $\ba = (1, 2, 3)^T$ and $\bb = (-1, 0, 2)^T$ by hand.
In NumPy, evaluate this using np.dot(a, b) and also with a @ b. Confirm both give the same result.
Show that $\ba \cdot \ba \ge 0$ with equality iff $\ba = \bzero$. What geometric quantity does $\sqrt{\ba \cdot \ba}$ represent?

Definition: Matrix Product

Let $\bA \in \mathbb{R}^{m \times n}$ and $\bB \in \mathbb{R}^{n \times p}$. Their product $\bA\bB$ is an $m \times p$ matrix defined by \[(\bA\bB)_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj},\] or equivalently, if $\ba_i$ denotes the $i$-th row of $\bA$ and $\bb_j$ the $j$-th column of $\bB$: \[\bA \bB = \begin{pmatrix} \ba_1 \cdot \bb_1 & \cdots & \ba_1 \cdot \bb_p \\ \vdots & \ddots & \vdots \\ \ba_m \cdot \bb_1 & \cdots & \ba_m \cdot \bb_p \end{pmatrix}.\]

Tip

Matrix multiplication is defined only when the number of columns of $\bA$ equals the number of rows of $\bB$. It is associative and distributive over addition, but in general not commutative: $\bA\bB \neq \bB\bA$ for most matrices.

Lemma: Computational Cost of Matrix Multiplication

Computing $\bA\bB$ where $\bA \in \fR^{m \times n}$ and $\bB \in \fR^{n \times p}$ requires exactly $mnp$ multiplications and $m(n-1)p$ additions using the standard algorithm, giving $O(mnp)$ floating-point operations (flops). For two square $n \times n$ matrices this reduces to $O(n^3)$ flops.

Tip

The $O(n^3)$ cost of naive matrix multiplication is a central bottleneck in scientific computing. Strassen’s algorithm reduces this to $O(n^{\log_2 7}) \approx O(n^{2.807})$, but is rarely used in practice due to numerical stability concerns and cache-unfriendly memory access. Modern BLAS/LAPACK implementations approach peak hardware throughput via blocking strategies, but the asymptotic cost remains $O(n^3)$. For many applications, avoiding explicit matrix-matrix products entirely (e.g., exploiting sparsity or factored structure) is more impactful than algorithmic improvements.

Compatibility for Matrix Multiplication$$2mm] Let $\bA$ be a $3 \times 4$ matrix and $\bB$ a $4 \times 2$ matrix. Since the number of columns of $\bA$ (4) equals the number of rows of $\bB$ (4), the product $\bA\bB$ is defined and results in a $3 \times 2$ matrix. Conversely, if $\bB$ were a $5 \times 2$ matrix, the product would be undefined.

Distributive Property of Matrix Multiplication\[2mm] Let [ = \[\begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}\] , = \[\begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{pmatrix}\] , = \[\begin{pmatrix} c_{11} & c_{12} \\ c_{21} & c_{22} \end{pmatrix}\] .\[ Then, \](+ ) = \[\begin{pmatrix} b_{11}+c_{11} & b_{12}+c_{12} \\ b_{21}+c_{21} & b_{22}+c_{22} \end{pmatrix}\]

,\[ and by computing the entries using the result above, one can verify that \](+ ) = + .$$

Non-Commutativity of Matrix Multiplication$$2mm] Consider \[ \bA = \begin{pmatrix} 1 & 0 \\ 0 & 2 \end{pmatrix} \quad \text{and} \quad \bB = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}.$$ Then, \[\bA\bB = \begin{pmatrix} 0 & 1 \\ 2 & 0 \end{pmatrix} \quad \text{and} \quad \bB\bA = \begin{pmatrix} 0 & 2 \\ 1 & 0 \end{pmatrix}.\] Since $\bA\bB \neq \bB\bA$, matrix multiplication is not commutative.

Exercise

Refer to the result above and the result above. Let $\bA = \begin{pmatrix} 2 & 1 \\ 0 & 3 \end{pmatrix}$ and $\bB = \begin{pmatrix} 1 & -1 \\ 2 & 4 \end{pmatrix}$.

Compute $\bA\bB$ and $\bB\bA$ by hand. Confirm $\bA\bB \neq \bB\bA$.
Find a nonzero matrix $\mathbf{C} \neq \bI$ that commutes with $\bA$ (i.e., $\bA\mathbf{C} = \mathbf{C}\bA$).
Verify associativity: compute $(\bA\bB)\bc$ and $\bA(\bB\bc)$ for $\bc = (1,0)^T$ and confirm they are equal.
Using the result above, compare the flop count of $(\bA\bB)\bc$ versus $\bA(\bB\bc)$ for general $\bA, \bB \in \fR^{n \times n}$ and $\bc \in \fR^n$. Which order is cheaper and by what factor?

Many properties of scalar arithmetic extend naturally to vectors and matrices. In fact, matrices can be viewed as elements of a higher-dimensional vector space (see the result above), and matrix multiplication can be interpreted as the composition of linear transformations. While the notations for vectors and matrices are similar, their roles differ:

Vectors typically represent points or directions in space.
Matrices often represent linear maps or transformations between vector spaces.

This conceptual distinction becomes crucial when discussing eigenvalues and linear transformations.

Corollary: Parenthesization Reduces Cost from $O(n^3)$ to $O(n^2)$

For $\bA, \bB \in \fR^{n \times n}$ and $\bx \in \fR^n$, computing $\bA(\bB\bx)$ via two matrix-vector products costs $O(n^2)$ flops, whereas forming $(\bA\bB)\bx$ via a matrix-matrix product costs $O(n^3)$ flops.

Exercise

In this exercise, we will prove the result above. Refer to the result above.

Suppose $\bA, \bB \in \fR^{n \times n}$ and $\bx \in \fR^n$.

Approach 1: Compute $\bM = \bA\bB$ first (cost?), then $\bM\bx$ (cost?). Write the total flop count.
Approach 2: Compute $\by = \bB\bx$ first (cost?), then $\bA\by$ (cost?). Write the total flop count.
Conclude the asymptotic improvement factor as $n \to \infty$. This proves the result above.
Write NumPy code for both approaches using @. Does NumPy automatically choose the optimal evaluation order for a chain like A @ B @ x?

4.3 Matrix Inverse

We now turn to a fundamental question: when can we ``divide’’ by a matrix? The matrix inverse plays the role of reciprocal in scalar arithmetic, but it exists only under a specific condition, and computing it explicitly is almost always the wrong approach in practice.

Definition: Matrix Inverse

For a square matrix $\bA \in \fR^{n \times n}$, the inverse of $\bA$, denoted $\bA^{-1}$, is the unique matrix satisfying \[\bA \bA^{-1} = \bA^{-1} \bA = \mathbf{I},\] where $\mathbf{I}$ is the $n \times n$ identity matrix. A matrix that has an inverse is called invertible or nonsingular.

Tip

A square matrix $\bA$ is invertible if and only if $\det(\bA) \neq 0$, which also implies that $\bA$ has full rank.

Uniqueness of the Inverse$$2mm] Suppose $\bA$ is invertible and both $\bB$ and $\mathbf{C}$ satisfy the conditions of the result above. Then, [ = = () = () = = .$$ Thus the inverse of $\bA$ is unique.

Theorem: Properties of the Matrix Inverse

Let $\bA, \bB \in \fR^{n \times n}$ be invertible and $\alpha \in \fR \setminus \{0\}$. Then:

$(\bA^{-1})^{-1} = \bA$
$(\bA\bB)^{-1} = \bB^{-1}\bA^{-1}$ *(Reversal Law)**
$(\alpha\bA)^{-1} = \tfrac{1}{\alpha}\bA^{-1}$
$(\bA^T)^{-1} = (\bA^{-1})^T$

Proof

(1): $\bA^{-1}\bA = \bI$ and $\bA\bA^{-1} = \bI$ show $\bA$ is both a left and right inverse of $\bA^{-1}$, so $(\bA^{-1})^{-1} = \bA$.

(2): Check the right-inverse condition: \[(\bA\bB)(\bB^{-1}\bA^{-1}) = \bA(\bB\bB^{-1})\bA^{-1} = \bA\bI\bA^{-1} = \bA\bA^{-1} = \bI.\] Left-inverse: $(\bB^{-1}\bA^{-1})(\bA\bB) = \bB^{-1}(\bA^{-1}\bA)\bB = \bI$. By uniqueness of the inverse, $(\bA\bB)^{-1} = \bB^{-1}\bA^{-1}$.

(3): $(\alpha\bA)\!\left(\tfrac{1}{\alpha}\bA^{-1}\right) = \tfrac{\alpha}{\alpha}\bA\bA^{-1} = \bI$.

(4): Transpose both sides of $\bA\bA^{-1} = \bI$ using the Reversal Law (see the transpose analogue): $(\bA^{-1})^T\bA^T = \bI$, so $(\bA^T)^{-1} = (\bA^{-1})^T$.

Tip

Property (2) is critical in scientific computing. To solve $(\bA\bB)\bx = \bc$, we should not compute $(\bA\bB)^{-1}$ explicitly (large constant factor, numerical error amplification). Instead, use the Reversal Law: first solve $\bA\by = \bc$ for $\by$ via LU factorization, then solve $\bB\bx = \by$. The function np.linalg.solve does exactly this.

Exercise

Refer to the result above and the result above.

For $\bA = \begin{pmatrix} 2 & 1 \\ 5 & 3 \end{pmatrix}$, compute $\bA^{-1}$ by hand using the $2 \times 2$ formula $\bA^{-1} = \frac{1}{\det\bA}\begin{pmatrix} d & -b \\ -c & a \end{pmatrix}$. Verify $\bA\bA^{-1} = \bI$.
Given $\bB = \begin{pmatrix} 1 & 0 \\ -1 & 1 \end{pmatrix}$, use the Reversal Law (the result above(2)) to compute $(\bA\bB)^{-1}$ without inverting a new $2 \times 2$ system.
Explain why np.linalg.solve(A, b) is preferred over np.linalg.inv(A) @ b for solving $\bA\bx = \bb$. Address both numerical accuracy and computational cost.

Definition: Involutory Matrix

A square matrix $\bA \in \fR^{n \times n}$ is called involutory if $\bA^2 = \bI$.

Exercise

In this exercise, we will prove that a matrix is involutory (the result above) if and only if it is its own inverse. Refer to the result above and the result above.

Suppose $\bA^2 = \bI$. Multiply both sides of $\bA\bA = \bI$ on the right by $\bA^{-1}$ to deduce $\bA^{-1} = \bA$.
Conversely, suppose $\bA^{-1} = \bA$. Show $\bA^2 = \bI$ directly from the result above.
Give a concrete $2 \times 2$ example of an involutory matrix other than $\pm\bI$. Verify it satisfies $\bA^2 = \bI$.

Exercise

(Identifying Invertibility) For each matrix below, determine without computing the inverse whether it is invertible. Give a one-sentence justification. \[(a)\ \begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix} \qquad (b)\ \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \qquad (c)\ \begin{pmatrix} 3 & 0 & 0 \\ 0 & 5 & 0 \\ 0 & 0 & -1 \end{pmatrix} \qquad (d)\ \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \\ 1 & 2 & 1 \end{pmatrix}\]