|
numerics
|
Given a sequence \(x[0], x[1], \ldots, x[n-1] \in \mathbb{C}\), the Discrete Fourier Transform (DFT) maps it to a frequency-domain sequence \(X[0], \ldots, X[n-1]\):
\[ X[k] = \sum_{j=0}^{n-1} x[j]\, e^{-2\pi i j k / n}, \qquad k = 0, 1, \ldots, n-1. \]
The inverse DFT recovers \(x\) from \(X\):
\[ x[j] = \frac{1}{n} \sum_{k=0}^{n-1} X[k]\, e^{+2\pi i j k / n}. \]
Convention note: both backends follow the FFTW sign convention. ifft returns the unnormalised inverse – divide by \(n\) to recover \(x\) exactly.
For a signal sampled at rate \(f_s\) (samples per second) with \(n\) samples, bin \(k\) corresponds to frequency
\[ f_k = \frac{k \cdot f_s}{n}. \]
Bins \(k = 0, \ldots, n/2\) are non-negative frequencies; bins \(k = n/2+1, \ldots, n-1\) are the negative frequencies (aliased by periodicity). For real input the spectrum is Hermitian: \(X[n-k] = \overline{X[k]}\), so only the first \(n/2+1\) bins carry independent information – this is what rfft returns.
Energy is conserved up to the factor \(n\):
\[ \sum_{j=0}^{n-1} |x[j]|^2 = \frac{1}{n} \sum_{k=0}^{n-1} |X[k]|^2. \]
Evaluating all \(n\) bins directly costs \(O(n^2)\) multiplications – impractical for large \(n\). The key observation that breaks this barrier is the periodicity and symmetry of the twiddle factors \(W_n^{jk} = e^{-2\pi i jk/n}\).
Define the primitive \(n\)-th root of unity \(\omega_n = e^{-2\pi i / n}\). Then \(W_n^{jk} = \omega_n^{jk}\), and these factors satisfy:
\[ \omega_{2n}^{2k} = \omega_n^k \quad \text{(halving)}, \qquad \omega_n^{k+n/2} = -\omega_n^k \quad \text{(symmetry)}. \]
These two identities are the engine of the Cooley-Tukey algorithm.
The Cooley-Tukey radix-2 decimation-in-time (DIT) algorithm, published in 1965, reduces an \(n\)-point DFT to two \(n/2\)-point DFTs by splitting the input into even- and odd-indexed elements.
Split \(x\) into even and odd halves:
\[ X[k] = \sum_{j=0}^{n/2-1} x[2j]\,\omega_n^{2jk} + \sum_{j=0}^{n/2-1} x[2j+1]\,\omega_n^{(2j+1)k}. \]
Using \(\omega_n^{2jk} = \omega_{n/2}^{jk}\):
\[ X[k] = \underbrace{\sum_{j=0}^{n/2-1} x[2j]\,\omega_{n/2}^{jk}}_{E[k]} + \omega_n^k \underbrace{\sum_{j=0}^{n/2-1} x[2j+1]\,\omega_{n/2}^{jk}}_{O[k]}. \]
where \(E[k]\) and \(O[k]\) are themselves DFTs of length \(n/2\). The symmetry property \(\omega_n^{k+n/2} = -\omega_n^k\) means the second half of the output costs nothing extra:
\[ X[k] = E[k] + \omega_n^k\, O[k], \\ X[k+n/2] = E[k] - \omega_n^k\, O[k]. \]
This is the butterfly operation: given the pair \((u, v)\) and twiddle factor \(w = \omega_n^k\),
\[ (u,\; vw) \;\longrightarrow\; (u + vw,\; u - vw). \]
Each butterfly costs 1 complex multiply and 2 complex adds. Applying the recursion \(\log_2 n\) times yields \(O(n \log n)\) total work.
Let \(T(n)\) be the operation count. The radix-2 split gives:
\[ T(n) = 2\,T(n/2) + O(n), \qquad T(1) = O(1). \]
By the Master theorem (case 2): \(T(n) = O(n \log n)\).
For \(n = 2^{20} \approx 10^6\) this is roughly \(20 \times 10^6\) operations versus \(10^{12}\) for the naive DFT – a factor of \(50{,}000\) speedup.
The recursive split reorders the input so that element \(j\) ends up at position equal to the bit-reversal of \(j\) in \(\log_2 n\) bits. For example with \(n = 8\):
| Original index | Binary | Bit-reversed | Permuted index |
|---|---|---|---|
| 0 | 000 | 000 | 0 |
| 1 | 001 | 100 | 4 |
| 2 | 010 | 010 | 2 |
| 3 | 011 | 110 | 6 |
| 4 | 100 | 001 | 1 |
| 5 | 101 | 101 | 5 |
| 6 | 110 | 011 | 3 |
| 7 | 111 | 111 | 7 |
The iterative DIT algorithm applies this permutation in-place first, then processes the butterfly stages bottom-up (length-2 butterflies, then length-4, ..., then length- \(n\)).
For the inverse DFT, replace \(-2\pi\) with \(+2\pi\) in the twiddle angle. The output is unnormalised; divide each element by \(n\) for the true inverse.
Complexity: \(O(n \log n)\) multiplications; \(O(n)\) extra memory for the in-place permutation and butterfly stages.
When the input \(x\) is real, \(X[n-k] = \overline{X[k]}\) (Hermitian symmetry), so only the \(n/2 + 1\) bins \(k = 0, \ldots, n/2\) are independent.
rfft returns exactly these \(n/2 + 1\) complex values. irfft reconstructs the full real output from them by:
This halves the storage compared to the complex DFT and, in the FFTW backend, uses the dedicated fftw_plan_dft_r2c_1d / fftw_plan_dft_c2r_1d plans which exploit the symmetry for an additional speed improvement.
For repeated transforms of the same length, the planning cost (twiddle precomputation or FFTW measurement) is paid once:
For the seq backend, FFTPlanImpl stores all twiddle factors for every butterfly stage, removing the cos/sin calls from the hot loop. For the fftw backend, FFTPlanImpl wraps an fftw_plan and calls fftw_execute_dft with the new-array API so the same plan can be reused across different input buffers.
| Backend | Algorithm | Size constraint | When selected |
|---|---|---|---|
seq | Iterative Cooley-Tukey radix-2 DIT | Power of two | Always available |
fftw | FFTW3 (mixed-radix, SIMD) | Any \(n\) | NUMERICS_HAS_FFTW defined at configure time |
Backend selection: default_fft_backend is fftw when FFTW3 is found, seq otherwise. An explicit backend can be passed to any function:
Implements the iterative radix-2 DIT described above. Requires \(n\) to be a power of two; throws std::invalid_argument otherwise. Twiddle factors are recomputed on each call; FFTPlan precomputes and caches them.
Wraps FFTW3 (libfftw3). One-shot calls use FFTW_ESTIMATE (no measurement). FFTPlan uses FFTW_MEASURE – FFTW benchmarks several internal decompositions at plan construction time and selects the fastest. The plan is then reused via fftw_execute_dft (new-array API) which is thread-safe for different input/output buffers.
FFTW supports arbitrary \(n\) via mixed-radix decomposition; performance is best for highly composite \(n\) (especially powers of two, but also products of small primes such as 2, 3, 5, 7).
Size and allocation requirements:
| Function | in.size() | out pre-allocated to |
|---|---|---|
fft, ifft | any power of two (seq) / any n (fftw) | in.size() |
rfft | same constraints | in.size() / 2 + 1 |
irfft | n / 2 + 1 | n |
FFTPlan::execute | must equal plan size | plan size |
Passing a size mismatch throws std::invalid_argument.
Compute the spectrum of \(x[j] = \cos(2\pi \cdot 4 \cdot j / 64)\), a pure tone at frequency bin 4 out of 64 samples.
Expected output (up to floating-point rounding): X[4] = 32.0, all others ~0.0. The factor of \(n/2 = 32\) arises from the unnormalised convention: \(\cos(\theta) = (e^{i\theta} + e^{-i\theta})/2\), so bin 4 receives \(n/2\) from the \(e^{i\theta}\) component.