|
numerics
|
Handwritten AVX2 / NEON butterfly for the FFT. More...
#include "spectral/fft.hpp"#include "../seq/impl.hpp"#include <cmath>#include <stdexcept>#include <vector>Go to the source code of this file.
Classes | |
| struct | backends::opt::FFTPlanImpl |
| Precomputed twiddle factors + SIMD butterfly execution. More... | |
Namespaces | |
| namespace | backends |
| namespace | backends::opt |
Functions | |
| void | backends::opt::fft (const num::CVector &in, num::CVector &out) |
| void | backends::opt::ifft (const num::CVector &in, num::CVector &out) |
| void | backends::opt::rfft (const num::Vector &in, num::CVector &out) |
| void | backends::opt::irfft (const num::CVector &in, int n, num::Vector &out) |
Handwritten AVX2 / NEON butterfly for the FFT.
Processes 2 complex butterflies per SIMD iteration using 256-bit (AVX2) or 128-bit (NEON) registers. Falls back to the seq scalar butterfly when neither ISA extension is available at compile time.
AVX2 complex butterfly (2 pairs per __m256d): u = [ur0, ui0, ur1, ui1] v = [vr0, vi0, vr1, vi1] w = [wr0, wi0, wr1, wi1]
w_re = unpacklo(w, w) -> [wr0, wr0, wr1, wr1] w_im = unpackhi(w, w) -> [wi0, wi0, wi1, wi1] v_sw = permute(v, 0b0101) -> [vi0, vr0, vi1, vr1] (swap re/im) t = addsub(v*w_re, v_sw*w_im) = [vr0*wr0 - vi0*wi0, vr0*wi0 + vi0*wr0, vr1*wr1 - vi1*wi1, vr1*wi1 + vi1*wr1]
NEON deinterleaves with vld2q_f64 (SoA load), multiplies component-wise, and re-interleaves with vst2q_f64.
Definition in file impl.hpp.