numerics 0.1.0
Loading...
Searching...
No Matches
raw.hpp File Reference

Tier-1 kernel: raw-pointer, inline, zero-overhead inner loops. More...

#include "core/types.hpp"
#include <cmath>
#include <cstring>

Go to the source code of this file.

Namespaces

namespace  num
 
namespace  num::kernel
 
namespace  num::kernel::raw
 

Macros

#define NUM_K_AINLINE   inline
 
#define NUM_K_RESTRICT
 
#define NUM_K_IVDEP
 

Functions

NUM_K_AINLINE void num::kernel::raw::scale (real *NUM_K_RESTRICT x, real alpha, idx n) noexcept
 x[i] *= alpha
 
NUM_K_AINLINE void num::kernel::raw::axpy (real *NUM_K_RESTRICT y, const real *NUM_K_RESTRICT x, real alpha, idx n) noexcept
 y[i] += alpha * x[i]
 
NUM_K_AINLINE void num::kernel::raw::axpby (real *NUM_K_RESTRICT y, const real *NUM_K_RESTRICT x, real a, real b, idx n) noexcept
 y[i] = a*x[i] + b*y[i] (fused scale-and-add, one memory pass)
 
NUM_K_AINLINE void num::kernel::raw::axpbyz (real *NUM_K_RESTRICT z, const real *NUM_K_RESTRICT x, const real *NUM_K_RESTRICT y, real a, real b, idx n) noexcept
 z[i] = a*x[i] + b*y[i] (fused, three-array, one pass each)
 
NUM_K_AINLINE real num::kernel::raw::dot (const real *NUM_K_RESTRICT x, const real *NUM_K_RESTRICT y, idx n) noexcept
 dot product: return sum x[i] * y[i]
 
NUM_K_AINLINE real num::kernel::raw::norm_sq (const real *NUM_K_RESTRICT x, idx n) noexcept
 sum x[i]^2 (no sqrt; use for convergence checks to avoid sqrt cost)
 
NUM_K_AINLINE real num::kernel::raw::norm (const real *NUM_K_RESTRICT x, idx n) noexcept
 Euclidean norm: sqrt(sum x[i]^2)
 
NUM_K_AINLINE real num::kernel::raw::l1_norm (const real *NUM_K_RESTRICT x, idx n) noexcept
 L1 norm: sum |x[i]|.
 
NUM_K_AINLINE real num::kernel::raw::linf_norm (const real *NUM_K_RESTRICT x, idx n) noexcept
 L-infinity norm: max |x[i]|.
 
NUM_K_AINLINE real num::kernel::raw::sum (const real *NUM_K_RESTRICT x, idx n) noexcept
 Scalar sum: return sum x[i].
 
NUM_K_AINLINE void num::kernel::raw::matvec (real *NUM_K_RESTRICT y, const real *NUM_K_RESTRICT A, const real *NUM_K_RESTRICT x, idx m, idx n) noexcept
 y[i] = sum_j A[i*n + j] * x[j] (m x n row-major matrix)
 
NUM_K_AINLINE void num::kernel::raw::ger (real *NUM_K_RESTRICT A, const real *NUM_K_RESTRICT x, const real *NUM_K_RESTRICT y, real alpha, idx m, idx n) noexcept
 Rank-1 update: A[i*n + j] += alpha * x[i] * y[j] (m x n row-major)
 
NUM_K_AINLINE void num::kernel::raw::trsv_lower (real *NUM_K_RESTRICT x, const real *NUM_K_RESTRICT L, const real *NUM_K_RESTRICT b, idx n) noexcept
 Forward substitution: solve Lx = b, L lower triangular (n x n, row-major).
 
NUM_K_AINLINE void num::kernel::raw::trsv_upper (real *NUM_K_RESTRICT x, const real *NUM_K_RESTRICT U, const real *NUM_K_RESTRICT b, idx n) noexcept
 Back substitution: solve Ux = b, U upper triangular (n x n, row-major).
 

Detailed Description

Tier-1 kernel: raw-pointer, inline, zero-overhead inner loops.

These are the lowest-level building blocks in the numerics stack. Every higher tier (type-safe wrappers, operator interface, solvers) ultimately reduces to these loops.

Design invariants:

  • All functions are NUM_K_AINLINE: the compiler always sees the loop body and can vectorize, fuse, or eliminate it during inlining.
  • NUM_K_RESTRICT on every pointer: tells the compiler inputs do not alias, which is necessary for auto-vectorization of most loops.
  • noexcept throughout: no exception machinery in hot paths.
  • No heap allocation: callers own all memory.
  • BLAS dispatch (#ifdef NUMERICS_HAS_BLAS): routes to cblas Level-1/2 when available. BLAS handles its own internal threading and is safe to call from any context (no nested-OMP issue).
  • No OpenMP: OMP at raw level causes nested-parallelism problems when a higher-level kernel wraps these in its own parallel region. OMP parallelism lives at Tier 2.

Operations: — Level 1 (vector) — scale(x, alpha, n) x[i] *= alpha axpy(y, x, alpha, n) y[i] += alpha * x[i] axpby(y, x, a, b, n) y[i] = a*x[i] + b*y[i] [fused, no BLAS eq.] axpbyz(z, x, y, a, b, n) z[i] = a*x[i] + b*y[i] [fused, no BLAS eq.] dot(x, y, n) return sum x[i]*y[i] norm_sq(x, n) return sum x[i]^2 [no BLAS sqrt cost] norm(x, n) return sqrt(norm_sq) l1_norm(x, n) return sum |x[i]| linf_norm(x, n) return max |x[i]| sum(x, n) return sum x[i] [no BLAS eq.]

— Level 2 (matrix-vector, row-major) — matvec(y, A, x, m, n) y[i] = sum_j A[i*n+j] * x[j] ger(A, x, y, alpha, m, n) A[i*n+j] += alpha * x[i] * y[j] trsv_lower(x, L, b, n) solve Lx = b (L lower triangular) trsv_upper(x, U, b, n) solve Ux = b (U upper triangular)

Include this header when writing a fused custom kernel that needs to compose multiple operations into a single memory pass. For ordinary user code prefer num::kernel:: (Tier 2).

Definition in file raw.hpp.

Macro Definition Documentation

◆ NUM_K_AINLINE

#define NUM_K_AINLINE   inline

Definition at line 70 of file raw.hpp.

◆ NUM_K_IVDEP

#define NUM_K_IVDEP

Definition at line 72 of file raw.hpp.

◆ NUM_K_RESTRICT

#define NUM_K_RESTRICT

Definition at line 71 of file raw.hpp.