6  Computational Complexity

Algorithmic cost measured in arithmetic operations (flops). Determines scalability and feasibility.

6.1 Big-O Notation

NoteDefinition: Big-O Notation

\(f(n) = O(g(n))\) if \(\exists C, n_0 > 0\) s.t. \(|f(n)| \leq C|g(n)|\) for \(n \geq n_0\).

  • \(\Omega(g)\): Grows at least as fast.

  • \(\Theta(g)\): Grows at the same rate.

  • \(o(g)\): Grows strictly slower.

Tip

Typical Rates. \(O(1) \ll O(\log n) \ll O(n) \ll O(n \log n) \ll O(n^2) \ll O(n^3) \ll O(2^n)\). For \(n=1000\): \(O(n^2)\) is \(\sim 10^6\) ops (ms); \(O(n^3)\) is \(\sim 10^9\) ops (s).

Tip

Properties. 1. Transitivity: \(f=O(g), g=O(h) \Rightarrow f=O(h)\). 2. Sum Rule: \(O(f) + O(g) = O(\max(f, g))\). 3. Product Rule: \(O(f) \cdot O(g) = O(fg)\). 4. Constant Factors: \(O(cf) = O(f)\).

WarningExercise
  1. Show \(3n^2 + 100n + 5 = O(n^2)\).

  2. Compare growth: \(n \log n\) vs \(n^2\).

  3. Rank by cost: \(5n^3+n, 100n^2, n^2 \log n, 2^{10}n^2\).

  4. Use Master Theorem: \(T(n) = 2T(n/2) + n\).

6.2 Flop Counting

Hardware-independent measure of work.

NoteDefinition: Flop

A single floating-point operation (\(+, -, \times, \div\)).

NoteTheorem: Basic Costs

For \(\bx, \by \in \fR^n\) and \(\bA \in \fR^{m \times n}, \bB \in \fR^{n \times p}\):

  • Dot product: \(2n\) flops.

  • Matrix-vector product: \(2mn\) flops.

  • Matrix-matrix product: \(2mnp\) flops (\(2n^3\) for square).

  • Triangular solve: \(n^2\) flops.

Tip

Data Locality and Performance: On modern architectures, computational throughput often significantly exceeds memory bandwidth. Consequently, algorithms with high cache locality, such as blocked matrix multiplication (GEMM), may achieve superior performance compared to algorithms with lower theoretical operation counts but poorer data reuse.

Tip

Benchmarks.

  • LU Factorization: \(\frac{2}{3}n^3\).

  • Cholesky: \(\frac{1}{3}n^3\).

  • QR (Householder): \(\frac{4}{3}n^3\).

WarningExercise
  1. Verify \(2mn\) flops for \(\bA\bx\) by counting multiply-adds.

  2. Estimate wall-clock time for \(10^9\) flops on a \(10^9\) flop/s CPU.

  3. Python: Plot np.dot(A, B) time vs \(n\) (log-log). Verify exponent \(\approx 3\).

  4. Why might the empirical exponent be less than 3? (Cache hierarchy, vectorization).