|
numerics 0.1.0
|
Register blocking is the scalar micro-kernel layer above cache blocking and below explicit SIMD.
The parameters are:
block_size: cache tile size.reg_size: small scalar tile size used inside each cache tile.The implementation accumulates a small \(r\times r\) block in scalar temporaries:
After the local accumulation, the temporary tile is stored back to C.
This routine is kept as an implementation diagnostic. The normal public backend selection is still:
Register blocking is useful for explaining the transition from cache blocking to SIMD micro-kernels. Production code should usually select Backend::blas when a tuned BLAS is available.