|
numerics
|
SIMD backend – hand-written AVX2/NEON matmul and matvec. More...
Go to the source code of this file.
Namespaces | |
| namespace | num |
| namespace | num::backends |
| namespace | num::backends::simd |
Functions | |
| void | num::backends::simd::matmul (const Matrix &A, const Matrix &B, Matrix &C, idx block_size) |
| void | num::backends::simd::matvec (const Matrix &A, const Vector &x, Vector &y) |
SIMD backend – hand-written AVX2/NEON matmul and matvec.
Compile-time dispatch: NUMERICS_HAS_AVX2 -> AVX-256 + FMA (x86-64, 4 doubles/register) NUMERICS_HAS_NEON -> ARM NEON (AArch64, 2 doubles/register) neither -> falls back to cache-blocked scalar
Both backends use the same register-tile structure: Outer cache tile: ii -> jj -> kk (B tile stays in L2) Inner reg tile: 4 rows x 4 cols (AVX: 4 YMM regs; NEON: 8 Q-regs) Hot k loop: one vector FMA per row, zero loop overhead for j
Definition in file matrix.cpp.