|
numerics
|
Functions | |
| real * | alloc (idx n) |
| Allocate device memory. | |
| void | free (real *ptr) |
| Free device memory. | |
| void | to_device (real *dst, const real *src, idx n) |
| Copy host to device. | |
| void | to_host (real *dst, const real *src, idx n) |
| Copy device to host. | |
| void | scale (real *v, idx n, real alpha) |
| v = alpha * v | |
| void | add (const real *x, const real *y, real *z, idx n) |
| z = x + y | |
| void | axpy (real alpha, const real *x, real *y, idx n) |
| y = alpha*x + y | |
| real | dot (const real *x, const real *y, idx n) |
| dot product | |
| void | matvec (const real *A, const real *x, real *y, idx rows, idx cols) |
| y = A * x (row-major A) | |
| void | matmul (const real *A, const real *B, real *C, idx m, idx k, idx n) |
| C = A * B. | |
| void | thomas_batched (const real *a, const real *b, const real *c, const real *d, real *x, idx n, idx batch_size) |
| Batched Thomas algorithm for tridiagonal systems. | |
Allocate device memory.
Definition at line 10 of file cuda_stubs.cpp.
Referenced by num::Matrix::to_gpu(), num::BasicVector< T >::to_gpu(), and num::BandedMatrix::to_gpu().
y = alpha*x + y
Definition at line 16 of file cuda_stubs.cpp.
Referenced by num::backends::gpu::axpy().
dot product
Definition at line 17 of file cuda_stubs.cpp.
Referenced by num::backends::gpu::dot(), and num::backends::gpu::norm().
Free device memory.
Definition at line 11 of file cuda_stubs.cpp.
Referenced by num::BandedMatrix::operator=(), num::BasicVector< T >::operator=(), num::BandedMatrix::operator=(), num::Matrix::operator=(), num::Matrix::to_cpu(), num::BasicVector< T >::to_cpu(), num::BandedMatrix::~BandedMatrix(), num::BasicVector< T >::~BasicVector(), and num::Matrix::~Matrix().
C = A * B.
Definition at line 19 of file cuda_stubs.cpp.
Referenced by num::backends::gpu::matmul().
y = A * x (row-major A)
Definition at line 18 of file cuda_stubs.cpp.
Referenced by num::backends::gpu::matvec().
v = alpha * v
Definition at line 14 of file cuda_stubs.cpp.
Referenced by num::backends::gpu::scale().
| void num::cuda::thomas_batched | ( | const real * | a, |
| const real * | b, | ||
| const real * | c, | ||
| const real * | d, | ||
| real * | x, | ||
| idx | n, | ||
| idx | batch_size | ||
| ) |
Batched Thomas algorithm for tridiagonal systems.
| a | Lower diagonals (batch_size arrays of size n-1, packed consecutively) |
| b | Main diagonals (batch_size arrays of size n) |
| c | Upper diagonals (batch_size arrays of size n-1, packed consecutively) |
| d | Right-hand sides (batch_size arrays of size n) |
| x | Solution vectors (batch_size arrays of size n) |
| n | Size of each system |
| batch_size | Number of independent systems to solve |
Definition at line 20 of file cuda_stubs.cpp.
Referenced by num::thomas().
Copy host to device.
Definition at line 12 of file cuda_stubs.cpp.
Referenced by num::cg(), num::Matrix::to_gpu(), num::BasicVector< T >::to_gpu(), and num::BandedMatrix::to_gpu().
Copy device to host.
Definition at line 13 of file cuda_stubs.cpp.
Referenced by num::Matrix::to_cpu(), num::BasicVector< T >::to_cpu(), and num::BandedMatrix::to_cpu().