We are all familiar with scalar arithmetic: addition, subtraction, multiplication, and properties such as associativity and distributivity. In the following sections we extend these operations to vectors and matrices, building the algebraic machinery used throughout scientific computing.
Addition and Subtraction
Matrix addition generalizes scalar addition by operating entry-wise. The key constraint is dimensional compatibility: two matrices can be added only when they share the same shape.
Let \(\bA, \bB \in \fR^{m \times n}\). Their sum is defined as \[
\begin{align}
\bA + \bB = \printmatsq{a} + \printmatsq{b} = \printmatsq{(a+b)},
\end{align}
\] where each entry satisfies \((a+b)_{ij} = a_{ij} + b_{ij}\) for \(1 \le i \le m\) and \(1 \le j \le n\).
For \(\bA \in \fR^{m \times n}\) and \(\bB \in \fR^{p \times q}\), the sum \(\bA + \bB\) is defined if and only if \(m = p\) and \(n = q\).
Refer to the result above.
Let \(\bA = \begin{pmatrix}1 & 2 \\ 3 & 4\end{pmatrix}\) and \(\bB = \begin{pmatrix}5 & 6 \\ 7 & 8\end{pmatrix}\). Compute \(\bA + \bB\) and \(2\bA - \bB\) by hand.
In NumPy, verify your answers using np.array. Then check whether np.array([1,2,3]) \(+\) np.array([1,2]) raises an error and explain why.
Commutativity and Associativity of Matrix Addition\[2mm] Let \[
\begin{align}
\bA = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \quad \text{and} \quad
\bB = \begin{pmatrix} e & f \\ g & h \end{pmatrix}.
\end{align}
\] Then, \[
\begin{align}
\bA + \bB = \begin{pmatrix} a+e & b+f \\ c+g & d+h \end{pmatrix}
\quad \text{and} \quad
\bB + \bA = \begin{pmatrix} e+a & f+b \\ g+c & h+d \end{pmatrix}.
\end{align}
\] Since addition of real numbers is commutative, \(\bA + \bB = \bB + \bA\). Furthermore, for any third matrix \(\bC = \begin{pmatrix} i & j \\ k & l \end{pmatrix}\), \[
\begin{align}
(\bA+\bB)+\bC = \begin{pmatrix} (a+e)+i & (b+f)+j \\ (c+g)+k & (d+h)+l \end{pmatrix},
\end{align}
\] and \[
\begin{align}
\bA+(\bB+\bC) = \begin{pmatrix} a+(e+i) & b+(f+j) \\ c+(g+k) & d+(h+l) \end{pmatrix}.
\end{align}
\] By associativity of real addition, these are identical. Hence matrix addition is both commutative and associative.
The set \(\fR^{m \times n}\) equipped with matrix addition and scalar multiplication forms a real vector space of dimension \(mn\).
Each axiom follows from the corresponding property of \(\fR\) applied entry-wise. For commutativity: \[
\begin{align}
(\bA + \bB)_{ij} &= a_{ij} + b_{ij} \\
&= b_{ij} + a_{ij} \\
&= (\bB + \bA)_{ij}.
\end{align}
\] The additive identity is \(\bzero\) since \((\bA + \bzero)_{ij} = a_{ij} + 0 = a_{ij}\). The additive inverse of \(\bA\) is \(-\bA\), satisfying \((\bA + (-\bA))_{ij} = a_{ij} - a_{ij} = 0\).
For the dimension: let \(\bE^{(ij)}\) be matrices with a \(1\) in position \((i,j)\) and zeros elsewhere. Every \(\bA \in \fR^{m \times n}\) decomposes uniquely as: \[
\begin{align}
\bA = \sum_{i,j} a_{ij}\,\bE^{(ij)}.
\end{align}
\] Since these \(mn\) matrices are linearly independent and span \(\fR^{m \times n}\), they form a basis of size \(mn\).
Let \(\bA \in \fR^{m \times n}\) and \(c \in \fR\). Then scalar multiplication is defined as \[
\begin{align}
c\bA = c \printmatsq{a} = \printmatsq{c a},
\end{align}
\] where each entry satisfies \((ca)_{ij} = c \cdot a_{ij}\).
Distributivity of Scalar Multiplication over Matrix Addition\[2mm] Let \(\alpha \in \fR\) and \[
\begin{align}
\bA = \begin{pmatrix} a & b \\ c & d \end{pmatrix}, \quad
\bB = \begin{pmatrix} e & f \\ g & h \end{pmatrix}.
\end{align}
\] Then, \[
\begin{align}
\alpha(\bA + \bB)
= \alpha \begin{pmatrix} a+e & b+f \\ c+g & d+h \end{pmatrix}
= \begin{pmatrix} \alpha(a+e) & \alpha(b+f) \\ \alpha(c+g) & \alpha(d+h) \end{pmatrix}.
\end{align}
\] Since scalar multiplication distributes over addition for real numbers, this equals \[
\begin{align}
\begin{pmatrix} \alpha a+\alpha e & \alpha b+\alpha f \\ \alpha c+\alpha g & \alpha d+\alpha h \end{pmatrix}
= \alpha\bA + \alpha\bB.
\end{align}
\]
Refer to the result above and the result above.
Compute \(3\bA - 2\bB\) for \(\bA = \begin{pmatrix}1 & 0 \\ -1 & 2\end{pmatrix}\) and \(\bB = \begin{pmatrix}2 & 1 \\ 0 & -1\end{pmatrix}\).
The standard basis for \(\fR^{2 \times 2}\) consists of four matrices \(\bE^{(ij)}\). Write them out explicitly and verify that \(\bA = \sum_{i,j} a_{ij}\bE^{(ij)}\) holds for your \(\bA\) above.
Dot Product
Before defining the general matrix product, we introduce the dot product of two vectors. The matrix product will then be understood entry-wise as a collection of dot products.
% For vectors \(\ba, \bb \in \fR^{n}\), the dot product (or inner product) is defined as \[
\ba \cdot \bb = \ba^T \bb = \printrowvec{a} \printcolvec{b} = \sum_{i=1}^{n} a_{i} b_{i}.
\]
Refer to the result above.
Compute \(\ba \cdot \bb\) for \(\ba = (1, 2, 3)^T\) and \(\bb = (-1, 0, 2)^T\) by hand.
In NumPy, evaluate this using np.dot(a, b) and also with a @ b. Confirm both give the same result.
Show that \(\ba \cdot \ba \ge 0\) with equality iff \(\ba = \bzero\). What geometric quantity does \(\sqrt{\ba \cdot \ba}\) represent?
Let \(\bA \in \fR^{m \times n}\) and \(\bB \in \fR^{n \times p}\). Their product \(\bA\bB\) is an \(m \times p\) matrix defined by \[
\begin{align}
(\bA\bB)_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj},
\end{align}
\] or equivalently, if \(\ba_i\) denotes the \(i\)-th row of \(\bA\) and \(\bb_j\) the \(j\)-th column of \(\bB\): \[
\begin{align}
\bA \bB =
\begin{pmatrix}
\ba_1 \cdot \bb_1 & \cdots & \ba_1 \cdot \bb_p \\
\vdots & \ddots & \vdots \\
\ba_m \cdot \bb_1 & \cdots & \ba_m \cdot \bb_p
\end{pmatrix}.
\end{align}
\]
Matrix multiplication is defined only when the number of columns of \(\bA\) equals the number of rows of \(\bB\). It is associative and distributive over addition, but in general not commutative: \(\bA\bB \neq \bB\bA\) for most matrices.
Computing \(\bA\bB\) where \(\bA \in \fR^{m \times n}\) and \(\bB \in \fR^{n \times p}\) requires exactly \(mnp\) multiplications and \(m(n-1)p\) additions using the standard algorithm, giving \(O(mnp)\) floating-point operations (flops). For two square \(n \times n\) matrices this reduces to \(O(n^3)\) flops.
The \(O(n^3)\) cost of naive matrix multiplication is a central bottleneck in scientific computing. Strassen’s algorithm reduces this to \(O(n^{\log_2 7}) \approx O(n^{2.807})\), but is rarely used in practice due to numerical stability concerns and cache-unfriendly memory access. Modern BLAS/LAPACK implementations approach peak hardware throughput via blocking strategies, but the asymptotic cost remains \(O(n^3)\). For many applications, avoiding explicit matrix-matrix products entirely (e.g., exploiting sparsity or factored structure) is more impactful than algorithmic improvements.
Compatibility for Matrix Multiplication\[2mm] Let \(\bA\) be a \(3 \times 4\) matrix and \(\bB\) a \(4 \times 2\) matrix. Since the number of columns of \(\bA\) (4) equals the number of rows of \(\bB\) (4), the product \(\bA\bB\) is defined and results in a \(3 \times 2\) matrix. Conversely, if \(\bB\) were a \(5 \times 2\) matrix, the product would be undefined.
Distributive Property of Matrix Multiplication\[2mm] Let \[
\begin{align}
\bA = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}, \quad
\bB = \begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{pmatrix}, \quad
\bC = \begin{pmatrix} c_{11} & c_{12} \\ c_{21} & c_{22} \end{pmatrix}.
\end{align}
\] Then, \[
\begin{align}
\bA(\bB + \bC) = \bA \begin{pmatrix} b_{11}+c_{11} & b_{12}+c_{12} \\ b_{21}+c_{21} & b_{22}+c_{22} \end{pmatrix},
\end{align}
\] and by computing the entries, one can verify that \[
\begin{align}
\bA(\bB + \bC) = \bA\bB + \bA\bC.
\end{align}
\]
Non-Commutativity of Matrix Multiplication\[2mm] Consider \[
\begin{align}
\bA = \begin{pmatrix} 1 & 0 \\ 0 & 2 \end{pmatrix} \quad \text{and} \quad
\bB = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}.
\end{align}
\] Then, \[
\begin{align}
\bA\bB = \begin{pmatrix} 0 & 1 \\ 2 & 0 \end{pmatrix}
\quad \text{and} \quad
\bB\bA = \begin{pmatrix} 0 & 2 \\ 1 & 0 \end{pmatrix}.
\end{align}
\] Since \(\bA\bB \neq \bB\bA\), matrix multiplication is not commutative.
Refer to the result above and the result above. Let \(\bA = \begin{pmatrix} 2 & 1 \\ 0 & 3 \end{pmatrix}\) and \(\bB = \begin{pmatrix} 1 & -1 \\ 2 & 4 \end{pmatrix}\).
Compute \(\bA\bB\) and \(\bB\bA\) by hand. Confirm \(\bA\bB \neq \bB\bA\).
Find a nonzero matrix \(\bC \neq \bI\) that commutes with \(\bA\) (i.e., \(\bA\bC = \bC\bA\)). Hint: try \(\bC = \alpha\bA + \beta\bI\).
Verify associativity: compute \((\bA\bB)\bc\) and \(\bA(\bB\bc)\) for \(\bc = (1,0)^T\) and confirm they are equal.
Compare the flop count of \((\bA\bB)\bc\) versus \(\bA(\bB\bc)\) for general \(\bA, \bB \in \fR^{n \times n}\) and \(\bc \in \fR^n\). Which order is cheaper and by what factor?
Many properties of scalar arithmetic extend naturally to vectors and matrices. In fact, matrices can be viewed as elements of a higher-dimensional vector space, and matrix multiplication can be interpreted as the composition of linear transformations. While the notations for vectors and matrices are similar, their roles differ:
This conceptual distinction becomes crucial when discussing eigenvalues and linear transformations.
For \(\bA, \bB \in \fR^{n \times n}\) and \(\bx \in \fR^n\), computing \(\bA(\bB\bx)\) via two matrix-vector products costs \(O(n^2)\) flops, whereas forming \((\bA\bB)\bx\) via a matrix-matrix product costs \(O(n^3)\) flops.
Suppose \(\bA, \bB \in \fR^{n \times n}\) and \(\bx \in \fR^n\).
Approach 1: Compute \(\bM = \bA\bB\) first (cost?), then \(\bM\bx\) (cost?). Write the total flop count.
Approach 2: Compute \(\by = \bB\bx\) first (cost?), then \(\bA\by\) (cost?). Write the total flop count.
Conclude the asymptotic improvement factor as \(n \to \infty\).
Write NumPy code for both approaches using @. Does NumPy automatically choose the optimal evaluation order for a chain like A @ B @ x?
Matrix Inverse
We now turn to a fundamental question: under what conditions can a matrix operation be defined that is analogous to scalar division? The matrix inverse plays the role of the reciprocal in scalar arithmetic, but it exists only under specific conditions.
For a square matrix \(\bA \in \fR^{n \times n}\), the inverse of \(\bA\), denoted \(\bA^{-1}\), is the unique matrix satisfying \[
\begin{align}
\bA \bA^{-1} = \bA^{-1} \bA = \bI,
\end{align}
\] where \(\bI\) is the \(n \times n\) identity matrix. A matrix that has an inverse is called invertible or nonsingular.
A square matrix \(\bA\) is invertible if and only if \(\det(\bA) \neq 0\), which also implies that \(\bA\) has full rank.
Uniqueness of the Inverse\[2mm] Suppose \(\bA\) is invertible and both \(\bB\) and \(\bC\) satisfy the conditions of the result above. Then, \[
\begin{align}
\bB = \bB\bI = \bB(\bA\bC) = (\bB\bA)\bC = \bI\bC = \bC.
\end{align}
\] Thus the inverse of \(\bA\) is unique.
Let \(\bA, \bB \in \fR^{n \times n}\) be invertible and \(\alpha \in \fR \setminus \{0\}\). Then:
\((\bA^{-1})^{-1} = \bA\)
\((\bA\bB)^{-1} = \bB^{-1}\bA^{-1}\) *(Reversal Law)**
\((\alpha\bA)^{-1} = \tfrac{1}{\alpha}\bA^{-1}\)
\((\bA^T)^{-1} = (\bA^{-1})^T\)
(1): The conditions \(\bA^{-1}\bA = \bI\) and \(\bA\bA^{-1} = \bI\) show that \(\bA\) is both a left and right inverse of \(\bA^{-1}\). By the uniqueness of the inverse: \[
\begin{align}
(\bA^{-1})^{-1} = \bA.
\end{align}
\]
(2): Check the right-inverse condition for \(\bB^{-1}\bA^{-1}\): \[
\begin{align}
(\bA\bB)(\bB^{-1}\bA^{-1}) &= \bA(\bB\bB^{-1})\bA^{-1} \\
&= \bA\bI\bA^{-1} \\
&= \bA\bA^{-1} = \bI.
\end{align}
\] The left-inverse condition \((\bB^{-1}\bA^{-1})(\bA\bB) = \bI\) follows similarly. Thus \((\bA\bB)^{-1} = \bB^{-1}\bA^{-1}\).
(3): For any \(\alpha \neq 0\): \[
\begin{align}
(\alpha\bA)\!\left(\frac{1}{\alpha}\bA^{-1}\right) = \frac{\alpha}{\alpha}(\bA\bA^{-1}) = \bI.
\end{align}
\]
(4): Transpose both sides of \(\bA\bA^{-1} = \bI\) using the Reversal Law for transposes: \[
\begin{align}
(\bA\bA^{-1})^T &= \bI^T \\
(\bA^{-1})^T\bA^T &= \bI.
\end{align}
\] This shows \((\bA^{-1})^T\) is the inverse of \(\bA^T\), so \((\bA^T)^{-1} = (\bA^{-1})^T\).
The Decomposition Principle. In scientific computing, we almost never compute \(\bA^{-1}\) explicitly. To solve \(\bA\bx = \bb\), we decompose \(\bA\) into simpler factors (like LU or QR) and solve a sequence of simpler systems. This is more numerically stable and computationally efficient. The Reversal Law \((\bA\bB)^{-1} = \bB^{-1}\bA^{-1}\) reflects this: to solve \((\bA\bB)\bx = \bb\), we first solve \(\bA\by = \bb\) for \(\by\), then \(\bB\bx = \by\).
Refer to the result above and the result above.
For \(\bA = \begin{pmatrix} 2 & 1 \\ 5 & 3 \end{pmatrix}\), compute \(\bA^{-1}\) by hand using the \(2 \times 2\) formula \(\bA^{-1} = \frac{1}{\det\bA}\begin{pmatrix} d & -b \\ -c & a \end{pmatrix}\). Verify \(\bA\bA^{-1} = \bI\).
Given \(\bB = \begin{pmatrix} 1 & 0 \\ -1 & 1 \end{pmatrix}\), use the Reversal Law to compute \((\bA\bB)^{-1}\) without inverting a new \(2 \times 2\) system.
Explain why np.linalg.solve(A, b) is preferred over np.linalg.inv(A) @ b for solving \(\bA\bx = \bb\). Address both numerical accuracy and computational cost.
A square matrix \(\bA \in \fR^{n \times n}\) is called involutory if \(\bA^2 = \bI\).
In this exercise, we will prove that a matrix is involutory (the result above) if and only if it is its own inverse. Refer to the result above and the result above.
Suppose \(\bA^2 = \bI\). Multiply both sides of \(\bA\bA = \bI\) on the right by \(\bA^{-1}\) to deduce \(\bA^{-1} = \bA\).
Conversely, suppose \(\bA^{-1} = \bA\). Show \(\bA^2 = \bI\).
Give a concrete \(2 \times 2\) example of an involutory matrix other than \(\pm\bI\). Verify it satisfies \(\bA^2 = \bI\).
(Identifying Invertibility) For each matrix below, determine without computing the inverse whether it is invertible. Give a one-sentence justification. \[
\begin{align}
(a)\ \begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix} \qquad
(b)\ \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \qquad
(c)\ \begin{pmatrix} 3 & 0 & 0 \\ 0 & 5 & 0 \\ 0 & 0 & -1 \end{pmatrix} \qquad
(d)\ \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \\ 1 & 2 & 1 \end{pmatrix}
\end{align}
\]