We are all familiar with scalar arithmetic: addition, subtraction, multiplication, and properties such as associativity and distributivity. In the following sections we extend these operations to vectors and matrices, building the algebraic machinery used throughout scientific computing.
Addition and Subtraction
Matrix addition generalizes scalar addition by operating entry-wise. The key constraint is dimensional compatibility: two matrices can be added only when they share the same shape.
Let \(\bA, \bB \in \fR^{m \times n}\). Their sum is defined as \[\bA + \bB = \printmatsq{a} + \printmatsq{b} = \printmatsq{(a+b)},\] where each entry satisfies \((a+b)_{ij} = a_{ij} + b_{ij}\) for \(1 \le i \le m\) and \(1 \le j \le n\).
For \(\bA \in \fR^{m \times n}\) and \(\bB \in \fR^{p \times q}\), the sum \(\bA + \bB\) is defined if and only if \(m = p\) and \(n = q\).
Refer to the result above.
Let \(\bA = \begin{pmatrix}1 & 2 \\ 3 & 4\end{pmatrix}\) and \(\bB = \begin{pmatrix}5 & 6 \\ 7 & 8\end{pmatrix}\). Compute \(\bA + \bB\) and \(2\bA - \bB\) by hand.
In NumPy, verify your answers using np.array. Then check whether np.array([1,2,3]) \(+\) np.array([1,2]) raises an error and explain why.
Commutativity and Associativity of Matrix Addition$\(2mm]
Let
\[
\bA = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \quad \text{and} \quad
\bB = \begin{pmatrix} e & f \\ g & h \end{pmatrix}.\)$ Then, \[\bA + \bB = \begin{pmatrix} a+e & b+f \\ c+g & d+h \end{pmatrix}
\quad \text{and} \quad
\bB + \bA = \begin{pmatrix} e+a & f+b \\ g+c & h+d \end{pmatrix}.\] Since addition of real numbers is commutative, \(\bA + \bB = \bB + \bA\). Furthermore, for any third matrix \(\mathbf{C} = \begin{pmatrix} i & j \\ k & l \end{pmatrix}\), \[(\bA+\bB)+\mathbf{C} = \begin{pmatrix} (a+e)+i & (b+f)+j \\ (c+g)+k & (d+h)+l \end{pmatrix},\] and \[\bA+(\bB+\mathbf{C}) = \begin{pmatrix} a+(e+i) & b+(f+j) \\ c+(g+k) & d+(h+l) \end{pmatrix}.\] By associativity of real addition, these are identical. Hence matrix addition is both commutative and associative.
The set \(\fR^{m \times n}\) equipped with matrix addition and scalar multiplication forms a real vector space of dimension \(mn\).
Each axiom follows from the corresponding property of \(\fR\) applied entry-wise. For commutativity: \[(\bA + \bB)_{ij} = a_{ij} + b_{ij} = b_{ij} + a_{ij} = (\bB + \bA)_{ij}.\] The additive identity is \(\mathbf{0}\) since \((\bA + \mathbf{0})_{ij} = a_{ij}\). The additive inverse of \(\bA\) is \(-\bA\) (negate every entry), satisfying \((\bA + (-\bA))_{ij} = 0\).
For the dimension: the \(mn\) matrices \(\bE^{(ij)}\) with a \(1\) in position \((i,j)\) and zeros elsewhere are linearly independent, and every \(\bA \in \fR^{m \times n}\) decomposes as \(\bA = \sum_{i,j} a_{ij}\,\bE^{(ij)}\), so they form a basis of size \(mn\).
Let \(\bA \in \fR^{m \times n}\) and \(c \in \fR\). Then scalar multiplication is defined as \[c\bA = c \printmatsq{a} = \printmatsq{c a},\] where each entry satisfies \((ca)_{ij} = c \cdot a_{ij}\).
Distributivity of Scalar Multiplication over Matrix Addition$$2mm] Let \(\alpha \in \fR\) and [ =
\[\begin{pmatrix} a & b \\ c & d \end{pmatrix}\]
, =
\[\begin{pmatrix} e & f \\ g & h \end{pmatrix}\]
.\[
Then,
\](+ ) =
\[\begin{pmatrix} a+e & b+f \\ c+g & d+h \end{pmatrix}\]
=
\[\begin{pmatrix} \alpha(a+e) & \alpha(b+f) \\ \alpha(c+g) & \alpha(d+h) \end{pmatrix}\]
.\[
Since scalar multiplication distributes over addition for real numbers, this equals
\]
\[\begin{pmatrix} \alpha a+\alpha e & \alpha b+\alpha f \\ \alpha c+\alpha g & \alpha d+\alpha h \end{pmatrix}\]
= + .$$
Refer to the result above and the result above.
Compute \(3\bA - 2\bB\) for \(\bA = \begin{pmatrix}1 & 0 \\ -1 & 2\end{pmatrix}\) and \(\bB = \begin{pmatrix}2 & 1 \\ 0 & -1\end{pmatrix}\).
The standard basis for \(\fR^{2 \times 2}\) consists of four matrices \(\bE^{(ij)}\). Write them out explicitly and verify that \(\bA = \sum_{i,j} a_{ij}\bE^{(ij)}\) holds for your \(\bA\) above.
Dot Product
Before defining the general matrix product, we introduce the dot product of two vectors. The matrix product will then be understood entry-wise as a collection of dot products.
% For vectors \(\ba, \bb \in \fR^{n}\), the dot product (or inner product) is defined as \[
\ba \cdot \bb = \ba^T \bb = \printrowvec{a} \printcolvec{b} = \sum_{i=1}^{n} a_{i} b_{i}.
\]
Refer to the result above.
Compute \(\ba \cdot \bb\) for \(\ba = (1, 2, 3)^T\) and \(\bb = (-1, 0, 2)^T\) by hand.
In NumPy, evaluate this using np.dot(a, b) and also with a @ b. Confirm both give the same result.
Show that \(\ba \cdot \ba \ge 0\) with equality iff \(\ba = \bzero\). What geometric quantity does \(\sqrt{\ba \cdot \ba}\) represent?
Let \(\bA \in \mathbb{R}^{m \times n}\) and \(\bB \in \mathbb{R}^{n \times p}\). Their product \(\bA\bB\) is an \(m \times p\) matrix defined by \[(\bA\bB)_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj},\] or equivalently, if \(\ba_i\) denotes the \(i\)-th row of \(\bA\) and \(\bb_j\) the \(j\)-th column of \(\bB\): \[\bA \bB =
\begin{pmatrix}
\ba_1 \cdot \bb_1 & \cdots & \ba_1 \cdot \bb_p \\
\vdots & \ddots & \vdots \\
\ba_m \cdot \bb_1 & \cdots & \ba_m \cdot \bb_p
\end{pmatrix}.\]
Matrix multiplication is defined only when the number of columns of \(\bA\) equals the number of rows of \(\bB\). It is associative and distributive over addition, but in general not commutative: \(\bA\bB \neq \bB\bA\) for most matrices.
Computing \(\bA\bB\) where \(\bA \in \fR^{m \times n}\) and \(\bB \in \fR^{n \times p}\) requires exactly \(mnp\) multiplications and \(m(n-1)p\) additions using the standard algorithm, giving \(O(mnp)\) floating-point operations (flops). For two square \(n \times n\) matrices this reduces to \(O(n^3)\) flops.
The \(O(n^3)\) cost of naive matrix multiplication is a central bottleneck in scientific computing. Strassen’s algorithm reduces this to \(O(n^{\log_2 7}) \approx O(n^{2.807})\), but is rarely used in practice due to numerical stability concerns and cache-unfriendly memory access. Modern BLAS/LAPACK implementations approach peak hardware throughput via blocking strategies, but the asymptotic cost remains \(O(n^3)\). For many applications, avoiding explicit matrix-matrix products entirely (e.g., exploiting sparsity or factored structure) is more impactful than algorithmic improvements.
Compatibility for Matrix Multiplication$$2mm] Let \(\bA\) be a \(3 \times 4\) matrix and \(\bB\) a \(4 \times 2\) matrix. Since the number of columns of \(\bA\) (4) equals the number of rows of \(\bB\) (4), the product \(\bA\bB\) is defined and results in a \(3 \times 2\) matrix. Conversely, if \(\bB\) were a \(5 \times 2\) matrix, the product would be undefined.
Distributive Property of Matrix Multiplication\[2mm] Let [ =
\[\begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}\]
, =
\[\begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{pmatrix}\]
, =
\[\begin{pmatrix} c_{11} & c_{12} \\ c_{21} & c_{22} \end{pmatrix}\]
.\[
Then,
\](+ ) =
\[\begin{pmatrix} b_{11}+c_{11} & b_{12}+c_{12} \\ b_{21}+c_{21} & b_{22}+c_{22} \end{pmatrix}\]
,\[
and by computing the entries using the result above, one can verify that
\](+ ) = + .$$
Non-Commutativity of Matrix Multiplication$\(2mm]
Consider
\[
\bA = \begin{pmatrix} 1 & 0 \\ 0 & 2 \end{pmatrix} \quad \text{and} \quad
\bB = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}.\)$ Then, \[\bA\bB = \begin{pmatrix} 0 & 1 \\ 2 & 0 \end{pmatrix}
\quad \text{and} \quad
\bB\bA = \begin{pmatrix} 0 & 2 \\ 1 & 0 \end{pmatrix}.\] Since \(\bA\bB \neq \bB\bA\), matrix multiplication is not commutative.
Refer to the result above and the result above. Let \(\bA = \begin{pmatrix} 2 & 1 \\ 0 & 3 \end{pmatrix}\) and \(\bB = \begin{pmatrix} 1 & -1 \\ 2 & 4 \end{pmatrix}\).
Compute \(\bA\bB\) and \(\bB\bA\) by hand. Confirm \(\bA\bB \neq \bB\bA\).
Find a nonzero matrix \(\mathbf{C} \neq \bI\) that commutes with \(\bA\) (i.e., \(\bA\mathbf{C} = \mathbf{C}\bA\)).
Verify associativity: compute \((\bA\bB)\bc\) and \(\bA(\bB\bc)\) for \(\bc = (1,0)^T\) and confirm they are equal.
Using the result above, compare the flop count of \((\bA\bB)\bc\) versus \(\bA(\bB\bc)\) for general \(\bA, \bB \in \fR^{n \times n}\) and \(\bc \in \fR^n\). Which order is cheaper and by what factor?
Many properties of scalar arithmetic extend naturally to vectors and matrices. In fact, matrices can be viewed as elements of a higher-dimensional vector space (see the result above), and matrix multiplication can be interpreted as the composition of linear transformations. While the notations for vectors and matrices are similar, their roles differ:
This conceptual distinction becomes crucial when discussing eigenvalues and linear transformations.
For \(\bA, \bB \in \fR^{n \times n}\) and \(\bx \in \fR^n\), computing \(\bA(\bB\bx)\) via two matrix-vector products costs \(O(n^2)\) flops, whereas forming \((\bA\bB)\bx\) via a matrix-matrix product costs \(O(n^3)\) flops.
In this exercise, we will prove the result above. Refer to the result above.
Suppose \(\bA, \bB \in \fR^{n \times n}\) and \(\bx \in \fR^n\).
Approach 1: Compute \(\bM = \bA\bB\) first (cost?), then \(\bM\bx\) (cost?). Write the total flop count.
Approach 2: Compute \(\by = \bB\bx\) first (cost?), then \(\bA\by\) (cost?). Write the total flop count.
Conclude the asymptotic improvement factor as \(n \to \infty\). This proves the result above.
Write NumPy code for both approaches using @. Does NumPy automatically choose the optimal evaluation order for a chain like A @ B @ x?
Matrix Inverse
We now turn to a fundamental question: when can we ``divide’’ by a matrix? The matrix inverse plays the role of reciprocal in scalar arithmetic, but it exists only under a specific condition, and computing it explicitly is almost always the wrong approach in practice.
For a square matrix \(\bA \in \fR^{n \times n}\), the inverse of \(\bA\), denoted \(\bA^{-1}\), is the unique matrix satisfying \[\bA \bA^{-1} = \bA^{-1} \bA = \mathbf{I},\] where \(\mathbf{I}\) is the \(n \times n\) identity matrix. A matrix that has an inverse is called invertible or nonsingular.
A square matrix \(\bA\) is invertible if and only if \(\det(\bA) \neq 0\), which also implies that \(\bA\) has full rank.
Uniqueness of the Inverse$$2mm] Suppose \(\bA\) is invertible and both \(\bB\) and \(\mathbf{C}\) satisfy the conditions of the result above. Then, [ = = () = () = = .$$ Thus the inverse of \(\bA\) is unique.
Let \(\bA, \bB \in \fR^{n \times n}\) be invertible and \(\alpha \in \fR \setminus \{0\}\). Then:
\((\bA^{-1})^{-1} = \bA\)
\((\bA\bB)^{-1} = \bB^{-1}\bA^{-1}\) *(Reversal Law)**
\((\alpha\bA)^{-1} = \tfrac{1}{\alpha}\bA^{-1}\)
\((\bA^T)^{-1} = (\bA^{-1})^T\)
(1): \(\bA^{-1}\bA = \bI\) and \(\bA\bA^{-1} = \bI\) show \(\bA\) is both a left and right inverse of \(\bA^{-1}\), so \((\bA^{-1})^{-1} = \bA\).
(2): Check the right-inverse condition: \[(\bA\bB)(\bB^{-1}\bA^{-1}) = \bA(\bB\bB^{-1})\bA^{-1} = \bA\bI\bA^{-1}
= \bA\bA^{-1} = \bI.\] Left-inverse: \((\bB^{-1}\bA^{-1})(\bA\bB) = \bB^{-1}(\bA^{-1}\bA)\bB = \bI\). By uniqueness of the inverse, \((\bA\bB)^{-1} = \bB^{-1}\bA^{-1}\).
(3): \((\alpha\bA)\!\left(\tfrac{1}{\alpha}\bA^{-1}\right) =
\tfrac{\alpha}{\alpha}\bA\bA^{-1} = \bI\).
(4): Transpose both sides of \(\bA\bA^{-1} = \bI\) using the Reversal Law (see the transpose analogue): \((\bA^{-1})^T\bA^T = \bI\), so \((\bA^T)^{-1} = (\bA^{-1})^T\).
Property (2) is critical in scientific computing. To solve \((\bA\bB)\bx = \bc\), we should not compute \((\bA\bB)^{-1}\) explicitly (large constant factor, numerical error amplification). Instead, use the Reversal Law: first solve \(\bA\by = \bc\) for \(\by\) via LU factorization, then solve \(\bB\bx = \by\). The function np.linalg.solve does exactly this.
Refer to the result above and the result above.
For \(\bA = \begin{pmatrix} 2 & 1 \\ 5 & 3 \end{pmatrix}\), compute \(\bA^{-1}\) by hand using the \(2 \times 2\) formula \(\bA^{-1} = \frac{1}{\det\bA}\begin{pmatrix} d & -b \\ -c & a \end{pmatrix}\). Verify \(\bA\bA^{-1} = \bI\).
Given \(\bB = \begin{pmatrix} 1 & 0 \\ -1 & 1 \end{pmatrix}\), use the Reversal Law (the result above(2)) to compute \((\bA\bB)^{-1}\) without inverting a new \(2 \times 2\) system.
Explain why np.linalg.solve(A, b) is preferred over np.linalg.inv(A) @ b for solving \(\bA\bx = \bb\). Address both numerical accuracy and computational cost.
A square matrix \(\bA \in \fR^{n \times n}\) is called involutory if \(\bA^2 = \bI\).
In this exercise, we will prove that a matrix is involutory (the result above) if and only if it is its own inverse. Refer to the result above and the result above.
Suppose \(\bA^2 = \bI\). Multiply both sides of \(\bA\bA = \bI\) on the right by \(\bA^{-1}\) to deduce \(\bA^{-1} = \bA\).
Conversely, suppose \(\bA^{-1} = \bA\). Show \(\bA^2 = \bI\) directly from the result above.
Give a concrete \(2 \times 2\) example of an involutory matrix other than \(\pm\bI\). Verify it satisfies \(\bA^2 = \bI\).
(Identifying Invertibility) For each matrix below, determine without computing the inverse whether it is invertible. Give a one-sentence justification. \[(a)\ \begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix} \qquad
(b)\ \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \qquad
(c)\ \begin{pmatrix} 3 & 0 & 0 \\ 0 & 5 & 0 \\ 0 & 0 & -1 \end{pmatrix} \qquad
(d)\ \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \\ 1 & 2 & 1 \end{pmatrix}\]