numerics
Loading...
Searching...
No Matches
Fast Fourier Transform

The Discrete Fourier Transform

Given a sequence \(x[0], x[1], \ldots, x[n-1] \in \mathbb{C}\), the Discrete Fourier Transform (DFT) maps it to a frequency-domain sequence \(X[0], \ldots, X[n-1]\):

\[ X[k] = \sum_{j=0}^{n-1} x[j]\, e^{-2\pi i j k / n}, \qquad k = 0, 1, \ldots, n-1. \]

The inverse DFT recovers \(x\) from \(X\):

\[ x[j] = \frac{1}{n} \sum_{k=0}^{n-1} X[k]\, e^{+2\pi i j k / n}. \]

Convention note: both backends follow the FFTW sign convention. ifft returns the unnormalised inverse – divide by \(n\) to recover \(x\) exactly.

Interpretation of frequency bins

For a signal sampled at rate \(f_s\) (samples per second) with \(n\) samples, bin \(k\) corresponds to frequency

\[ f_k = \frac{k \cdot f_s}{n}. \]

Bins \(k = 0, \ldots, n/2\) are non-negative frequencies; bins \(k = n/2+1, \ldots, n-1\) are the negative frequencies (aliased by periodicity). For real input the spectrum is Hermitian: \(X[n-k] = \overline{X[k]}\), so only the first \(n/2+1\) bins carry independent information – this is what rfft returns.

Parseval's theorem

Energy is conserved up to the factor \(n\):

\[ \sum_{j=0}^{n-1} |x[j]|^2 = \frac{1}{n} \sum_{k=0}^{n-1} |X[k]|^2. \]


Naive DFT: O(n^2) cost

Evaluating all \(n\) bins directly costs \(O(n^2)\) multiplications – impractical for large \(n\). The key observation that breaks this barrier is the periodicity and symmetry of the twiddle factors \(W_n^{jk} = e^{-2\pi i jk/n}\).

Define the primitive \(n\)-th root of unity \(\omega_n = e^{-2\pi i / n}\). Then \(W_n^{jk} = \omega_n^{jk}\), and these factors satisfy:

\[ \omega_{2n}^{2k} = \omega_n^k \quad \text{(halving)}, \qquad \omega_n^{k+n/2} = -\omega_n^k \quad \text{(symmetry)}. \]

These two identities are the engine of the Cooley-Tukey algorithm.


Cooley-Tukey Radix-2 DIT

The Cooley-Tukey radix-2 decimation-in-time (DIT) algorithm, published in 1965, reduces an \(n\)-point DFT to two \(n/2\)-point DFTs by splitting the input into even- and odd-indexed elements.

Derivation

Split \(x\) into even and odd halves:

\[ X[k] = \sum_{j=0}^{n/2-1} x[2j]\,\omega_n^{2jk} + \sum_{j=0}^{n/2-1} x[2j+1]\,\omega_n^{(2j+1)k}. \]

Using \(\omega_n^{2jk} = \omega_{n/2}^{jk}\):

\[ X[k] = \underbrace{\sum_{j=0}^{n/2-1} x[2j]\,\omega_{n/2}^{jk}}_{E[k]} + \omega_n^k \underbrace{\sum_{j=0}^{n/2-1} x[2j+1]\,\omega_{n/2}^{jk}}_{O[k]}. \]

where \(E[k]\) and \(O[k]\) are themselves DFTs of length \(n/2\). The symmetry property \(\omega_n^{k+n/2} = -\omega_n^k\) means the second half of the output costs nothing extra:

\[ X[k] = E[k] + \omega_n^k\, O[k], \\ X[k+n/2] = E[k] - \omega_n^k\, O[k]. \]

This is the butterfly operation: given the pair \((u, v)\) and twiddle factor \(w = \omega_n^k\),

\[ (u,\; vw) \;\longrightarrow\; (u + vw,\; u - vw). \]

Each butterfly costs 1 complex multiply and 2 complex adds. Applying the recursion \(\log_2 n\) times yields \(O(n \log n)\) total work.

Recurrence and complexity

Let \(T(n)\) be the operation count. The radix-2 split gives:

\[ T(n) = 2\,T(n/2) + O(n), \qquad T(1) = O(1). \]

By the Master theorem (case 2): \(T(n) = O(n \log n)\).

For \(n = 2^{20} \approx 10^6\) this is roughly \(20 \times 10^6\) operations versus \(10^{12}\) for the naive DFT – a factor of \(50{,}000\) speedup.

Bit-reversal permutation

The recursive split reorders the input so that element \(j\) ends up at position equal to the bit-reversal of \(j\) in \(\log_2 n\) bits. For example with \(n = 8\):

Original index Binary Bit-reversed Permuted index
0 000 000 0
1 001 100 4
2 010 010 2
3 011 110 6
4 100 001 1
5 101 101 5
6 110 011 3
7 111 111 7

The iterative DIT algorithm applies this permutation in-place first, then processes the butterfly stages bottom-up (length-2 butterflies, then length-4, ..., then length- \(n\)).


Algorithm (iterative DIT)

function FFT(x[0..n-1]):
bit_reverse_permute(x)
for len = 2, 4, 8, ..., n:
w_len = exp(-2*pi*i / len) // principal twiddle
for i = 0, len, 2*len, ..., n-len:
w = 1
for j = 0 to len/2 - 1:
u = x[i + j]
v = x[i + j + len/2] * w
x[i + j] = u + v // butterfly top
x[i + j + len/2] = u - v // butterfly bottom
w *= w_len

For the inverse DFT, replace \(-2\pi\) with \(+2\pi\) in the twiddle angle. The output is unnormalised; divide each element by \(n\) for the true inverse.

Complexity: \(O(n \log n)\) multiplications; \(O(n)\) extra memory for the in-place permutation and butterfly stages.


Real-to-Complex Transform (rfft)

When the input \(x\) is real, \(X[n-k] = \overline{X[k]}\) (Hermitian symmetry), so only the \(n/2 + 1\) bins \(k = 0, \ldots, n/2\) are independent.

rfft returns exactly these \(n/2 + 1\) complex values. irfft reconstructs the full real output from them by:

  1. Reconstructing the negative-frequency half via conjugate symmetry: \(X[n-k] = \overline{X[k]}\) for \(k = 1, \ldots, n/2 - 1\).
  2. Running the full complex inverse DFT on the reconstructed spectrum.
  3. Taking the real part (imaginary part is zero to floating-point precision for truly real input).

This halves the storage compared to the complex DFT and, in the FFTW backend, uses the dedicated fftw_plan_dft_r2c_1d / fftw_plan_dft_c2r_1d plans which exploit the symmetry for an additional speed improvement.


FFTPlan: amortizing planning cost

For repeated transforms of the same length, the planning cost (twiddle precomputation or FFTW measurement) is paid once:

num::spectral::FFTPlan plan(1024); // precomputes twiddles (seq) or
// runs FFTW_MEASURE (fftw)
for (auto& frame : frames)
plan.execute(frame, spectrum); // O(n log n), no allocation
Precomputed plan for repeated complex transforms of a fixed size.
Definition fft.hpp:102

For the seq backend, FFTPlanImpl stores all twiddle factors for every butterfly stage, removing the cos/sin calls from the hot loop. For the fftw backend, FFTPlanImpl wraps an fftw_plan and calls fftw_execute_dft with the new-array API so the same plan can be reused across different input buffers.


Backends

Backend Algorithm Size constraint When selected
seq Iterative Cooley-Tukey radix-2 DIT Power of two Always available
fftw FFTW3 (mixed-radix, SIMD) Any \(n\) NUMERICS_HAS_FFTW defined at configure time

Backend selection: default_fft_backend is fftw when FFTW3 is found, seq otherwise. An explicit backend can be passed to any function:

using namespace num::spectral;
fft(in, out, seq); // always use native Cooley-Tukey
fft(in, out, fftw); // always use FFTW3 (throws if not compiled in)

seq backend

Implements the iterative radix-2 DIT described above. Requires \(n\) to be a power of two; throws std::invalid_argument otherwise. Twiddle factors are recomputed on each call; FFTPlan precomputes and caches them.

fftw backend

Wraps FFTW3 (libfftw3). One-shot calls use FFTW_ESTIMATE (no measurement). FFTPlan uses FFTW_MEASURE – FFTW benchmarks several internal decompositions at plan construction time and selects the fastest. The plan is then reused via fftw_execute_dft (new-array API) which is thread-safe for different input/output buffers.

FFTW supports arbitrary \(n\) via mixed-radix decomposition; performance is best for highly composite \(n\) (especially powers of two, but also products of small primes such as 2, 3, 5, 7).


API Reference

namespace num::spectral {
// One-shot transforms
// Reusable plan
class FFTPlan {
explicit FFTPlan(int n, bool forward = true, FFTBackend b = default_fft_backend);
void execute(const CVector& in, CVector& out) const;
int size() const;
};
} // namespace num::spectral
Dense vector with optional GPU storage, templated over scalar type T.
Definition vector.hpp:23
FFTBackend backend() const
Definition fft.hpp:117
int size() const
Definition fft.hpp:116
void execute(const CVector &in, CVector &out) const
Definition fft.cpp:124
void ifft(const CVector &in, CVector &out, FFTBackend b=default_fft_backend)
Inverse complex DFT (unnormalised: result = n * true_inverse).
Definition fft.cpp:31
void fft(const CVector &in, CVector &out, FFTBackend b=default_fft_backend)
Forward complex DFT. out must be pre-allocated to in.size().
Definition fft.cpp:15
FFTBackend
Selects which backend handles an FFT operation.
Definition fft.hpp:26
void irfft(const CVector &in, int n, Vector &out, FFTBackend b=default_fft_backend)
Complex-to-real inverse DFT (unnormalised).
Definition fft.cpp:61
constexpr FFTBackend default_fft_backend
Automatically selected at configure time: fftw > simd > seq.
Definition fft.hpp:65
void rfft(const Vector &in, CVector &out, FFTBackend b=default_fft_backend)
Real-to-complex forward DFT. out must be pre-allocated to n/2+1.
Definition fft.cpp:46
constexpr T ipow(T x) noexcept
Compute x^N at compile time via repeated squaring.

Size and allocation requirements:

Function in.size() out pre-allocated to
fft, ifft any power of two (seq) / any n (fftw) in.size()
rfft same constraints in.size() / 2 + 1
irfft n / 2 + 1 n
FFTPlan::execute must equal plan size plan size

Passing a size mismatch throws std::invalid_argument.


Worked Example

Compute the spectrum of \(x[j] = \cos(2\pi \cdot 4 \cdot j / 64)\), a pure tone at frequency bin 4 out of 64 samples.

#include "spectral/fft.hpp"
#include <cmath>
#include <iostream>
int main() {
using namespace num::spectral;
const int n = 64;
num::Vector xr(n);
num::CVector X(n / 2 + 1);
for (int j = 0; j < n; ++j)
xr[j] = std::cos(2.0 * M_PI * 4 * j / n);
rfft(xr, X); // n/2+1 = 33 complex bins
// Bin 4 should be non-zero; all others ~0
for (int k = 0; k < 8; ++k)
std::cout << "X[" << k << "] = " << std::abs(X[k]) << "\n";
}
int main()
Definition main.cpp:84
FFT interface with backend dispatch.

Expected output (up to floating-point rounding): X[4] = 32.0, all others ~0.0. The factor of \(n/2 = 32\) arises from the unnormalised convention: \(\cos(\theta) = (e^{i\theta} + e^{-i\theta})/2\), so bin 4 receives \(n/2\) from the \(e^{i\theta}\) component.