Parallel support is optional and selected at configuration time. Public calls remain available in non-parallel builds through fallback paths or stubs.
OpenMP
OpenMP is selected by passing Backend::omp to routines that document an OpenMP path.
real trapz(ScalarFn f, real a, real b, idx n=100, Backend backend=Backend::seq)
Trapezoidal rule with n panels.
SolverResult jacobi(const Matrix &A, const Vector &b, Vector &x, real tol=1e-10, idx max_iter=1000, Backend backend=default_backend)
Jacobi iterative solver for Ax = b.
void matmul(const Matrix &A, const Matrix &B, Matrix &C, Backend b=default_backend)
C = A * B.
Implementation locations:
src/core/backends/omp/
src/analysis/quadrature.cpp
src/linalg/solvers/jacobi.cpp
CUDA
CUDA vector and matrix entry points use Backend::gpu.
void matvec(const Matrix &A, const Vector &x, Vector &y, Backend b=default_backend)
y = A * x
real norm(const Vector &x, Backend b=default_backend)
Compute .
void axpy(real alpha, const Vector &x, Vector &y, Backend b=default_backend)
Compute .
Implementation locations:
include/core/parallel/cuda_ops.hpp
src/core/parallel/cuda_ops.cu
src/core/parallel/cuda_stubs.cpp
src/core/backends/gpu/
When CUDA is not enabled, the stub implementation keeps downstream builds portable.
GPU Banded Solve
BandedSolverResult banded_solve(const BandedMatrix &A, const Vector &b, Vector &x)
Factor and solve .
The GPU path is intended for many structured systems or large banded problems. For small systems, launch overhead can dominate.
MPI
MPI helpers are exposed under num::mpi.
void allreduce_sum(real *data, idx n, MPI_Comm comm=MPI_COMM_WORLD)
Allreduce sum.
int size(MPI_Comm comm=MPI_COMM_WORLD)
Get communicator size.
int rank(MPI_Comm comm=MPI_COMM_WORLD)
Get communicator rank.
Implementation locations:
include/core/parallel/mpi_ops.hpp
src/core/parallel/mpi_ops.cpp
src/core/parallel/mpi_stubs.cpp
Backend Boundaries
Raw kernels do not call CUDA, MPI, OpenMP, or BLAS. Backend-specific calls live under src/core/backends or src/core/parallel.