6 Computational Complexity
Algorithmic cost measured in arithmetic operations (flops). Determines scalability and feasibility.
6.1 Big-O Notation
\(f(n) = O(g(n))\) if \(\exists C, n_0 > 0\) s.t. \(|f(n)| \leq C|g(n)|\) for \(n \geq n_0\).
\(\Omega(g)\): Grows at least as fast.
\(\Theta(g)\): Grows at the same rate.
\(o(g)\): Grows strictly slower.
Typical Rates. \(O(1) \ll O(\log n) \ll O(n) \ll O(n \log n) \ll O(n^2) \ll O(n^3) \ll O(2^n)\). For \(n=1000\): \(O(n^2)\) is \(\sim 10^6\) ops (ms); \(O(n^3)\) is \(\sim 10^9\) ops (s).
Properties. 1. Transitivity: \(f=O(g), g=O(h) \Rightarrow f=O(h)\). 2. Sum Rule: \(O(f) + O(g) = O(\max(f, g))\). 3. Product Rule: \(O(f) \cdot O(g) = O(fg)\). 4. Constant Factors: \(O(cf) = O(f)\).
Show \(3n^2 + 100n + 5 = O(n^2)\).
Compare growth: \(n \log n\) vs \(n^2\).
Rank by cost: \(5n^3+n, 100n^2, n^2 \log n, 2^{10}n^2\).
Use Master Theorem: \(T(n) = 2T(n/2) + n\).
6.2 Flop Counting
Hardware-independent measure of work.
A single floating-point operation (\(+, -, \times, \div\)).
For \(\bx, \by \in \fR^n\) and \(\bA \in \fR^{m \times n}, \bB \in \fR^{n \times p}\):
Dot product: \(2n\) flops.
Matrix-vector product: \(2mn\) flops.
Matrix-matrix product: \(2mnp\) flops (\(2n^3\) for square).
Triangular solve: \(n^2\) flops.
Data Locality and Performance: On modern architectures, computational throughput often significantly exceeds memory bandwidth. Consequently, algorithms with high cache locality, such as blocked matrix multiplication (GEMM), may achieve superior performance compared to algorithms with lower theoretical operation counts but poorer data reuse.
Benchmarks.
LU Factorization: \(\frac{2}{3}n^3\).
Cholesky: \(\frac{1}{3}n^3\).
QR (Householder): \(\frac{4}{3}n^3\).
Verify \(2mn\) flops for \(\bA\bx\) by counting multiply-adds.
Estimate wall-clock time for \(10^9\) flops on a \(10^9\) flop/s CPU.
Python: Plot
np.dot(A, B)time vs \(n\) (log-log). Verify exponent \(\approx 3\).Why might the empirical exponent be less than 3? (Cache hierarchy, vectorization).