Tier-1 kernel: raw-pointer, inline, zero-overhead inner loops. More...

#include "core/types.hpp"
#include <cmath>
#include <cstring>

Namespaces
namespace	num

namespace	num::kernel

namespace	num::kernel::raw

Macros
#define	NUM_K_AINLINE inline

#define	NUM_K_RESTRICT

#define	NUM_K_IVDEP

Functions
NUM_K_AINLINE void	num::kernel::raw::scale (real *NUM_K_RESTRICT x, real alpha, idx n) noexcept
	x[i] *= alpha

NUM_K_AINLINE void	num::kernel::raw::axpy (real NUM_K_RESTRICT y, const real NUM_K_RESTRICT x, real alpha, idx n) noexcept
	y[i] += alpha * x[i]

NUM_K_AINLINE void	num::kernel::raw::axpby (real NUM_K_RESTRICT y, const real NUM_K_RESTRICT x, real a, real b, idx n) noexcept
	y[i] = ax[i] + by[i] (fused scale-and-add, one memory pass)

NUM_K_AINLINE void	num::kernel::raw::axpbyz (real NUM_K_RESTRICT z, const real NUM_K_RESTRICT x, const real *NUM_K_RESTRICT y, real a, real b, idx n) noexcept
	z[i] = ax[i] + by[i] (fused, three-array, one pass each)

NUM_K_AINLINE real	num::kernel::raw::dot (const real NUM_K_RESTRICT x, const real NUM_K_RESTRICT y, idx n) noexcept
	dot product: return sum x[i] * y[i]

NUM_K_AINLINE real	num::kernel::raw::norm_sq (const real *NUM_K_RESTRICT x, idx n) noexcept
	sum x[i]^2 (no sqrt; use for convergence checks to avoid sqrt cost)

NUM_K_AINLINE real	num::kernel::raw::norm (const real *NUM_K_RESTRICT x, idx n) noexcept
	Euclidean norm: sqrt(sum x[i]^2)

NUM_K_AINLINE real	num::kernel::raw::l1_norm (const real *NUM_K_RESTRICT x, idx n) noexcept
	L1 norm: sum \|x[i]\|.

NUM_K_AINLINE real	num::kernel::raw::linf_norm (const real *NUM_K_RESTRICT x, idx n) noexcept
	L-infinity norm: max \|x[i]\|.

NUM_K_AINLINE real	num::kernel::raw::sum (const real *NUM_K_RESTRICT x, idx n) noexcept
	Scalar sum: return sum x[i].

NUM_K_AINLINE void	num::kernel::raw::matvec (real NUM_K_RESTRICT y, const real NUM_K_RESTRICT A, const real *NUM_K_RESTRICT x, idx m, idx n) noexcept
	y[i] = sum_j A[in + j] x[j] (m x n row-major matrix)

NUM_K_AINLINE void	num::kernel::raw::ger (real NUM_K_RESTRICT A, const real NUM_K_RESTRICT x, const real *NUM_K_RESTRICT y, real alpha, idx m, idx n) noexcept
	Rank-1 update: A[in + j] += alpha x[i] * y[j] (m x n row-major)

NUM_K_AINLINE void	num::kernel::raw::trsv_lower (real NUM_K_RESTRICT x, const real NUM_K_RESTRICT L, const real *NUM_K_RESTRICT b, idx n) noexcept
	Forward substitution: solve Lx = b, L lower triangular (n x n, row-major).

NUM_K_AINLINE void	num::kernel::raw::trsv_upper (real NUM_K_RESTRICT x, const real NUM_K_RESTRICT U, const real *NUM_K_RESTRICT b, idx n) noexcept
	Back substitution: solve Ux = b, U upper triangular (n x n, row-major).

Detailed Description

Tier-1 kernel: raw-pointer, inline, zero-overhead inner loops.

These are the lowest-level building blocks in the numerics stack. Every higher tier (type-safe wrappers, operator interface, solvers) ultimately reduces to these loops.

Design invariants:

All functions are NUM_K_AINLINE: the compiler always sees the loop body and can vectorize, fuse, or eliminate it during inlining.
NUM_K_RESTRICT on every pointer: tells the compiler inputs do not alias, which is necessary for auto-vectorization of most loops.
noexcept throughout: no exception machinery in hot paths.
No heap allocation: callers own all memory.
BLAS dispatch (#ifdef NUMERICS_HAS_BLAS): routes to cblas Level-1/2 when available. BLAS handles its own internal threading and is safe to call from any context (no nested-OMP issue).
No OpenMP: OMP at raw level causes nested-parallelism problems when a higher-level kernel wraps these in its own parallel region. OMP parallelism lives at Tier 2.

Operations: — Level 1 (vector) — scale(x, alpha, n) x[i] *= alpha axpy(y, x, alpha, n) y[i] += alpha * x[i] axpby(y, x, a, b, n) y[i] = a*x[i] + b*y[i] [fused, no BLAS eq.] axpbyz(z, x, y, a, b, n) z[i] = a*x[i] + b*y[i] [fused, no BLAS eq.] dot(x, y, n) return sum x[i]*y[i] norm_sq(x, n) return sum x[i]^2 [no BLAS sqrt cost] norm(x, n) return sqrt(norm_sq) l1_norm(x, n) return sum |x[i]| linf_norm(x, n) return max |x[i]| sum(x, n) return sum x[i] [no BLAS eq.]

— Level 2 (matrix-vector, row-major) — matvec(y, A, x, m, n) y[i] = sum_j A[i*n+j] * x[j] ger(A, x, y, alpha, m, n) A[i*n+j] += alpha * x[i] * y[j] trsv_lower(x, L, b, n) solve Lx = b (L lower triangular) trsv_upper(x, U, b, n) solve Ux = b (U upper triangular)

Include this header when writing a fused custom kernel that needs to compose multiple operations into a single memory pass. For ordinary user code prefer num::kernel:: (Tier 2).

Definition in file raw.hpp.

Macro Definition Documentation

◆ NUM_K_AINLINE

#define NUM_K_AINLINE inline

Definition at line 70 of file raw.hpp.

◆ NUM_K_IVDEP

#define NUM_K_IVDEP

Definition at line 72 of file raw.hpp.

◆ NUM_K_RESTRICT

#define NUM_K_RESTRICT

Definition at line 71 of file raw.hpp.

Namespaces

Macros

Functions

Detailed Description

Macro Definition Documentation

◆ NUM_K_AINLINE

◆ NUM_K_IVDEP

◆ NUM_K_RESTRICT