numerics 0.1.0
Loading...
Searching...
No Matches
Parallel, GPU, and MPI Implementation Note

Parallel support is optional and selected at configuration time. Public calls remain available in non-parallel builds through fallback paths or stubs.

OpenMP

OpenMP is selected by passing Backend::omp to routines that document an OpenMP path.

double I = num::trapz(f, 0.0, 1.0, 1000000, num::Backend::omp);
num::jacobi(A, b, x, 1e-8, 1000, num::Backend::omp);
real trapz(ScalarFn f, real a, real b, idx n=100, Backend backend=Backend::seq)
Trapezoidal rule with n panels.
SolverResult jacobi(const Matrix &A, const Vector &b, Vector &x, real tol=1e-10, idx max_iter=1000, Backend backend=default_backend)
Jacobi iterative solver for Ax = b.
Definition jacobi.cpp:7
void matmul(const Matrix &A, const Matrix &B, Matrix &C, Backend b=default_backend)
C = A * B.
Definition matrix.cpp:20

Implementation locations:

src/core/backends/omp/
src/analysis/quadrature.cpp
src/linalg/solvers/jacobi.cpp

CUDA

CUDA vector and matrix entry points use Backend::gpu.

void matvec(const Matrix &A, const Vector &x, Vector &y, Backend b=default_backend)
y = A * x
Definition matrix.cpp:45
real norm(const Vector &x, Backend b=default_backend)
Compute .
Definition vector.cpp:83
void axpy(real alpha, const Vector &x, Vector &y, Backend b=default_backend)
Compute .
Definition vector.cpp:44

Implementation locations:

include/core/parallel/cuda_ops.hpp
src/core/parallel/cuda_ops.cu
src/core/parallel/cuda_stubs.cpp
src/core/backends/gpu/

When CUDA is not enabled, the stub implementation keeps downstream builds portable.

GPU Banded Solve

auto result = num::banded_solve(A, b, num::Backend::gpu);
BandedSolverResult banded_solve(const BandedMatrix &A, const Vector &b, Vector &x)
Factor and solve .
Definition banded.cpp:281

The GPU path is intended for many structured systems or large banded problems. For small systems, launch overhead can dominate.

MPI

MPI helpers are exposed under num::mpi.

int rank = num::mpi::rank();
int size = num::mpi::size();
double total = num::mpi::allreduce_sum(local_value);
void allreduce_sum(real *data, idx n, MPI_Comm comm=MPI_COMM_WORLD)
Allreduce sum.
Definition mpi_ops.cpp:37
int size(MPI_Comm comm=MPI_COMM_WORLD)
Get communicator size.
Definition mpi_ops.cpp:19
int rank(MPI_Comm comm=MPI_COMM_WORLD)
Get communicator rank.
Definition mpi_ops.cpp:13

Implementation locations:

include/core/parallel/mpi_ops.hpp
src/core/parallel/mpi_ops.cpp
src/core/parallel/mpi_stubs.cpp

Backend Boundaries

Raw kernels do not call CUDA, MPI, OpenMP, or BLAS. Backend-specific calls live under src/core/backends or src/core/parallel.