6 Computational Complexity

Algorithmic cost measured in arithmetic operations (flops). Determines scalability and feasibility.

6.1 Big-O Notation

Definition: Big-O Notation

\(f(n) = O(g(n))\) if \(\exists C, n_0 > 0\) s.t. \(|f(n)| \leq C|g(n)|\) for \(n \geq n_0\).

\(\Omega(g)\): Grows at least as fast.
\(\Theta(g)\): Grows at the same rate.
\(o(g)\): Grows strictly slower.

Remark

(Typical Rates) \(O(1) \ll O(\log n) \ll O(n) \ll O(n \log n) \ll O(n^2) \ll O(n^3) \ll O(2^n)\). For \(n=1000\): \(O(n^2)\) is \(\sim 10^6\) ops (ms); \(O(n^3)\) is \(\sim 10^9\) ops (s).

Remark

(Properties) 1. Transitivity: \(f=O(g), g=O(h) \Rightarrow f=O(h)\). 2. Sum Rule: \(O(f) + O(g) = O(\max(f, g))\). 3. Product Rule: \(O(f) \cdot O(g) = O(fg)\). 4. Constant Factors: \(O(cf) = O(f)\).

Exercise

Show \(3n^2 + 100n + 5 = O(n^2)\).
Compare growth: \(n \log n\) vs \(n^2\).
Rank by cost: \(5n^3+n, 100n^2, n^2 \log n, 2^{10}n^2\).
Use Master Theorem: \(T(n) = 2T(n/2) + n\).

6.2 Flop Counting

Hardware-independent measure of work.

Definition: Flop

A single floating-point operation (\(+, -, \times, \div\)).

Theorem: Basic Costs

For \(\bx, \by \in \fR^n\) and \(\bA \in \fR^{m \times n}, \bB \in \fR^{n \times p}\):

Dot product: \(2n\) flops.
Matrix-vector product: \(2mn\) flops.
Matrix-matrix product: \(2mnp\) flops (\(2n^3\) for square).
Triangular solve: \(n^2\) flops.

Remark

Data Locality and Performance: On modern architectures, computational throughput often significantly exceeds memory bandwidth. Consequently, algorithms with high cache locality, such as blocked matrix multiplication (GEMM), may achieve superior performance compared to algorithms with lower theoretical operation counts but poorer data reuse.

Remark

(Benchmarks)

LU Factorization: \(\frac{2}{3}n^3\).
Cholesky: \(\frac{1}{3}n^3\).
QR (Householder): \(\frac{4}{3}n^3\).

Exercise

Verify \(2mn\) flops for \(\bA\bx\) by counting multiply-adds.
Estimate wall-clock time for \(10^9\) flops on a \(10^9\) flop/s CPU.
Python: Plot np.dot(A, B) time vs \(n\) (log-log). Verify exponent \(\approx 3\).
Why might the empirical exponent be less than 3? (Cache hierarchy, vectorization).