numerics 0.1.0
Loading...
Searching...
No Matches
dense.hpp
Go to the documentation of this file.
1/// @file kernel/dense.hpp
2/// @brief Dense matrix inner kernels (namespace num::kernel::dense)
3///
4/// These are the level-2 BLAS-equivalent operations exposed as first-class
5/// kernels. They are the inner loops that factorization solvers and iterative
6/// methods reduce to, but which are not currently accessible as standalone
7/// functions in the rest of numerics.
8///
9/// ger(alpha, x, y, A) -- A += alpha * x * y^T (rank-1 update)
10/// trsv_lower(L, b, x) -- solve Lx = b (L lower triangular)
11/// trsv_upper(U, b, x) -- solve Ux = b (U upper triangular)
12///
13/// ger has seq_t / par_t overloads: the outer loop over rows is independent
14/// and parallelizable.
15///
16/// trsv has no policy parameter: the outer loop has a serial dependency
17/// (each row depends on all previous rows). BLAS cblas_dtrsv is used
18/// automatically via raw:: when NUMERICS_HAS_BLAS is defined.
19///
20/// Include kernel/kernel.hpp to get all kernel sub-modules together.
21#pragma once
22
23#include "core/matrix.hpp"
24#include "core/types.hpp"
25#include "core/vector.hpp"
26#include "kernel/policy.hpp"
27
29
30// ---------------------------------------------------------------------------
31// ger: A[i,j] += alpha * x[i] * y[j]
32// ---------------------------------------------------------------------------
33
34/// @brief Sequential rank-1 update: calls raw::ger (routes to cblas_dger when
35/// BLAS available; otherwise vectorizable row-update loop).
36///
37/// A must be (x.size() x y.size()) row-major.
38void ger(real alpha, const Vector& x, const Vector& y, Matrix& A,
39 seq_t) noexcept;
40
41/// @brief Parallel rank-1 update: OMP parallel-for over rows.
42/// Each row update A[i,:] += alpha*x[i]*y is independent.
43void ger(real alpha, const Vector& x, const Vector& y, Matrix& A, par_t);
44
45/// @brief Default policy
46inline void ger(real alpha, const Vector& x, const Vector& y, Matrix& A) {
47 ger(alpha, x, y, A, default_policy{});
48}
49
50// ---------------------------------------------------------------------------
51// trsv_lower: solve Lx = b (L lower triangular)
52// ---------------------------------------------------------------------------
53
54/// @brief Forward substitution: solve Lx = b.
55///
56/// L must be square. x is allocated to L.rows() if x.size() != L.rows().
57/// No policy parameter: the outer loop has a serial dependency.
58/// Dispatches to cblas_dtrsv via raw:: when NUMERICS_HAS_BLAS.
59void trsv_lower(const Matrix& L, const Vector& b, Vector& x);
60
61// ---------------------------------------------------------------------------
62// trsv_upper: solve Ux = b (U upper triangular)
63// ---------------------------------------------------------------------------
64
65/// @brief Back substitution: solve Ux = b.
66///
67/// U must be square. x is allocated to U.rows() if x.size() != U.rows().
68/// No policy parameter: serial outer dependency.
69/// Dispatches to cblas_dtrsv via raw:: when NUMERICS_HAS_BLAS.
70void trsv_upper(const Matrix& U, const Vector& b, Vector& x);
71
72} // namespace num::kernel::dense
Dense row-major matrix with optional GPU storage.
Definition matrix.hpp:12
Core type definitions.
Compile-time dispatch policy tags for the kernel module.
Matrix operations.
void trsv_upper(const Matrix &U, const Vector &b, Vector &x)
Back substitution: solve Ux = b.
Definition dense.cpp:64
void trsv_lower(const Matrix &L, const Vector &b, Vector &x)
Forward substitution: solve Lx = b.
Definition dense.cpp:49
void ger(real alpha, const Vector &x, const Vector &y, Matrix &A, seq_t) noexcept
Sequential rank-1 update: calls raw::ger (routes to cblas_dger when BLAS available; otherwise vectori...
Definition dense.cpp:20
double real
Definition types.hpp:10
Parallel execution policy tag. Activates OMP parallel-for / reduction constructs when NUMERICS_HAS_OM...
Definition policy.hpp:43
Sequential execution policy tag. Guarantees no OMP parallel regions; safe to call inside an existing ...
Definition policy.hpp:38
Vector operations.