Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
147 commits
Select commit Hold shift + click to select a range
39630e4
[rocm-libraries] ROCm/rocm-libraries#471 (commit 0a9f1f2)
TorreZuk Jul 2, 2025
87367f9
[rocm-libraries] ROCm/rocm-libraries#488 (commit 5325630)
amcamd Jul 8, 2025
cbb99fa
[rocm-libraries] ROCm/rocm-libraries#517 (commit c6baf77)
TorreZuk Jul 9, 2025
fbad0fa
[rocm-libraries] ROCm/rocm-libraries#508 (commit d7462c8)
ckastner Jul 10, 2025
c71b397
[rocm-libraries] ROCm/rocm-libraries#524 (commit 4c9b063)
amd-garydeng Jul 14, 2025
816ef3c
[rocm-libraries] ROCm/rocm-libraries#640 (commit 65af0fd)
TorreZuk Jul 15, 2025
4eba18a
[rocm-libraries] ROCm/rocm-libraries#451 (commit ed5a325)
neon60 Jul 16, 2025
000585e
[rocm-libraries] ROCm/rocm-libraries#452 (commit d0524be)
amd-mtrifuno Jul 16, 2025
8f24985
[rocm-libraries] ROCm/rocm-libraries#541 (commit 43f328c)
amd-jnovotny Jul 16, 2025
0e08e0d
[rocm-libraries] ROCm/rocm-libraries#717 (commit 8f685a3)
amd-jnovotny Jul 17, 2025
6a5765b
[rocm-libraries] ROCm/rocm-libraries#699 (commit 78d9061)
amd-mtrifuno Jul 17, 2025
400837c
[rocm-libraries] ROCm/rocm-libraries#727 (commit de631f8)
amcamd Jul 18, 2025
e05530d
[rocm-libraries] ROCm/rocm-libraries#750 (commit 15c7f03)
estewart08 Jul 21, 2025
78ad3d8
Reconciling changes missed from monorepo
jayhawk-commits Jul 24, 2025
936fa53
[rocm-libraries] ROCm/rocm-libraries#731 (commit 974bc58)
TorreZuk Jul 25, 2025
c72cb44
[rocm-libraries] ROCm/rocm-libraries#781 (commit 1e2698c)
TorreZuk Jul 30, 2025
2a08432
[rocm-libraries] ROCm/rocm-libraries#684 (commit dd7ea70)
s769 Aug 1, 2025
c44fc5c
[rocm-libraries] ROCm/rocm-libraries#1028 (commit 7a1e745)
TorreZuk Aug 4, 2025
fd3192c
[rocm-libraries] ROCm/rocm-libraries#1039 (commit 82b70e6)
NaveenElumalaiAMD Aug 5, 2025
cb2b8d5
[rocm-libraries] ROCm/rocm-libraries#1069 (commit f6d6ac0)
amd-jnovotny Aug 8, 2025
32c70af
[rocm-libraries] ROCm/rocm-libraries#1141 (commit 405febb)
TorreZuk Aug 12, 2025
dc36604
[rocm-libraries] ROCm/rocm-libraries#1178 (commit 153b1c0)
TorreZuk Aug 13, 2025
df4cb9b
[rocm-libraries] ROCm/rocm-libraries#1189 (commit 3fb865c)
TorreZuk Aug 13, 2025
3b9ac51
[rocm-libraries] ROCm/rocm-libraries#919 (commit bd33b54)
TorreZuk Aug 14, 2025
2110bf5
[rocm-libraries] ROCm/rocm-libraries#1220 (commit 510d895)
amd-mtrifuno Aug 18, 2025
1732a37
[rocm-libraries] ROCm/rocm-libraries#1006 (commit a1df843)
evedovelli Aug 18, 2025
18b6f7a
[rocm-libraries] ROCm/rocm-libraries#1226 (commit 533d481)
TorreZuk Aug 18, 2025
278b253
[rocm-libraries] ROCm/rocm-libraries#1263 (commit 3744a4c)
TorreZuk Aug 20, 2025
c1e173a
[rocm-libraries] ROCm/rocm-libraries#1340 (commit d344945)
amd-jnovotny Aug 25, 2025
307f9f2
[rocm-libraries] ROCm/rocm-libraries#1284 (commit 49122db)
TorreZuk Sep 4, 2025
44ea908
[rocm-libraries] ROCm/rocm-libraries#1439 (commit 9b5c29c)
rkamd Sep 5, 2025
b55a39d
[rocm-libraries] ROCm/rocm-libraries#1270 (commit c7ee067)
TorreZuk Sep 5, 2025
87bc560
[rocm-libraries] ROCm/rocm-libraries#1290 (commit 6b4e62d)
jonatluu Sep 9, 2025
6dbd0c3
[rocm-libraries] ROCm/rocm-libraries#455 (commit 2fcfe4a)
lucbruni-amd Sep 10, 2025
a8c5812
[rocm-libraries] ROCm/rocm-libraries#1249 (commit 9c3856f)
TorreZuk Sep 10, 2025
d0faad3
[rocm-libraries] ROCm/rocm-libraries#1537 (commit 8bb68f1)
stellaraccident Sep 11, 2025
af862fb
[rocm-libraries] ROCm/rocm-libraries#1320 (commit 4dee3e2)
HPC-Ken Sep 11, 2025
c169627
[rocm-libraries] ROCm/rocm-libraries#1454 (commit ce30a87)
TorreZuk Sep 11, 2025
a2d47c6
[rocm-libraries] ROCm/rocm-libraries#1580 (commit 65830fa)
amcamd Sep 16, 2025
ec1099c
[rocm-libraries] ROCm/rocm-libraries#1498 (commit 12486d2)
TorreZuk Sep 16, 2025
0d97db5
[rocm-libraries] ROCm/rocm-libraries#1528 (commit 46a0bf1)
marbre Sep 18, 2025
04fffda
[rocm-libraries] ROCm/rocm-libraries#1421 (commit 58061dd)
amd-mtrifuno Sep 22, 2025
7f54783
[rocm-libraries] ROCm/rocm-libraries#1729 (commit b808f1b)
amd-jnovotny Sep 22, 2025
55788f8
[rocm-libraries] ROCm/rocm-libraries#1738 (commit b85751f)
amcamd Sep 24, 2025
11c23d7
[rocm-libraries] ROCm/rocm-libraries#456 (commit da1761c)
wfjsw Sep 25, 2025
5481b78
[rocm-libraries] ROCm/rocm-libraries#1516 (commit a2553f8)
rkamd Sep 25, 2025
5b7dbee
[rocm-libraries] ROCm/rocm-libraries#1762 (commit 826c424)
TorreZuk Sep 26, 2025
34afa5f
[rocm-libraries] ROCm/rocm-libraries#2121 (commit 3e707d2)
rkamd Oct 21, 2025
96de232
[rocm-libraries] ROCm/rocm-libraries#1823 (commit b5de71d)
TorreZuk Oct 22, 2025
121dd3c
[rocm-libraries] ROCm/rocm-libraries#2145 (commit 611160d)
umarinkovic Oct 22, 2025
23c1082
[rocm-libraries] ROCm/rocm-libraries#1942 (commit c10e329)
TorreZuk Oct 23, 2025
fda1ba9
[rocm-libraries] ROCm/rocm-libraries#1810 (commit 7966015)
actinks Oct 24, 2025
93fe7bf
[rocm-libraries] ROCm/rocm-libraries#454 (commit a5b869a)
TorreZuk Oct 24, 2025
30d7d27
[rocm-libraries] ROCm/rocm-libraries#2186 (commit 81bec5a)
TorreZuk Oct 27, 2025
f3b5aaa
[rocm-libraries] ROCm/rocm-libraries#2101 (commit 6d69843)
TorreZuk Oct 28, 2025
ce496a3
[rocm-libraries] ROCm/rocm-libraries#2326 (commit 7ebbca8)
amd-jnovotny Oct 29, 2025
f10a76e
[rocm-libraries] ROCm/rocm-libraries#2351 (commit 4981114)
TorreZuk Oct 30, 2025
cc07405
[rocm-libraries] ROCm/rocm-libraries#2396 (commit 8dd0ecd)
amcamd Nov 3, 2025
4ce9756
[rocm-libraries] ROCm/rocm-libraries#2417 (commit b2fdf51)
TorreZuk Nov 5, 2025
40db839
[rocm-libraries] ROCm/rocm-libraries#2461 (commit 68b549e)
TorreZuk Nov 5, 2025
1b591da
[rocm-libraries] ROCm/rocm-libraries#2488 (commit 7af2f5c)
TorreZuk Nov 6, 2025
7e23696
[rocm-libraries] ROCm/rocm-libraries#2259 (commit 9493de8)
rkamd Nov 8, 2025
38ca796
[rocm-libraries] ROCm/rocm-libraries#2539 (commit 49f31f8)
bstefanuk Nov 10, 2025
bee6367
[rocm-libraries] ROCm/rocm-libraries#2551 (commit 4f1e2b0)
TorreZuk Nov 12, 2025
f540f52
[rocm-libraries] ROCm/rocm-libraries#2555 (commit 5a8a2d0)
actinks Nov 14, 2025
72ffa68
[rocm-libraries] ROCm/rocm-libraries#2668 (commit 7952129)
amcamd Nov 14, 2025
357ebc0
[rocm-libraries] ROCm/rocm-libraries#2851 (commit d471513)
TorreZuk Nov 24, 2025
e41f264
[rocm-libraries] ROCm/rocm-libraries#2887 (commit 6221075)
TorreZuk Nov 26, 2025
ae8e52d
[rocm-libraries] ROCm/rocm-libraries#2918 (commit 02c7a7d)
TorreZuk Nov 26, 2025
4e10903
[rocm-libraries] ROCm/rocm-libraries#2980 (commit d5814a3)
TorreZuk Dec 2, 2025
6e2ad15
[rocm-libraries] ROCm/rocm-libraries#1943 (commit 43a79e3)
umarinkovic Dec 3, 2025
c0da0e9
[rocm-libraries] ROCm/rocm-libraries#2126 (commit dd227cb)
umarinkovic Dec 5, 2025
1d5adb1
[rocm-libraries] ROCm/rocm-libraries#3086 (commit 3aca9e3)
amd-jnovotny Dec 5, 2025
b707f2d
[rocm-libraries] ROCm/rocm-libraries#2653 (commit 7bc1152)
amd-mtrifuno Dec 15, 2025
e027434
[rocm-libraries] ROCm/rocm-libraries#3213 (commit fbaef68)
dsaffars Dec 16, 2025
0b1563a
[rocm-libraries] ROCm/rocm-libraries#3355 (commit c2cc913)
TorreZuk Dec 16, 2025
935a1b9
[rocm-libraries] ROCm/rocm-libraries#3400 (commit b930dd2)
samjwu Dec 18, 2025
47746b9
[rocm-libraries] ROCm/rocm-libraries#3488 (commit b599f9a)
TorreZuk Dec 19, 2025
c578f5c
[rocm-libraries] ROCm/rocm-libraries#3423 (commit b3b20a3)
TorreZuk Dec 23, 2025
32eb0a1
[rocm-libraries] ROCm/rocm-libraries#3449 (commit 17ad375)
TorreZuk Dec 23, 2025
5d73ae6
[rocm-libraries] ROCm/rocm-libraries#3499 (commit 0f1a5af)
TorreZuk Dec 23, 2025
d9d763d
[rocm-libraries] ROCm/rocm-libraries#3547 (commit 33a8945)
TorreZuk Dec 24, 2025
57c6761
[rocm-libraries] ROCm/rocm-libraries#3561 (commit 3f85d7b)
TorreZuk Dec 31, 2025
ad74ab5
[rocm-libraries] ROCm/rocm-libraries#3402 (commit 8d6ed24)
TorreZuk Dec 31, 2025
bad6917
[rocm-libraries] ROCm/rocm-libraries#3597 (commit 2875488)
dependabot[bot] Jan 2, 2026
7242dc5
[rocm-libraries] ROCm/rocm-libraries#3570 (commit 101b342)
TorreZuk Jan 2, 2026
3ec3ab1
[rocm-libraries] ROCm/rocm-libraries#3590 (commit dae7c1a)
TorreZuk Jan 2, 2026
1027f15
[rocm-libraries] ROCm/rocm-libraries#3430 (commit 8560acb)
tony-davis Jan 7, 2026
3a5a73b
[rocm-libraries] ROCm/rocm-libraries#3672 (commit 2ab01d6)
tony-davis Jan 9, 2026
f5c03a6
[rocm-libraries] ROCm/rocm-libraries#3730 (commit bb950c4)
TorreZuk Jan 9, 2026
509c8f8
[rocm-libraries] ROCm/rocm-libraries#3489 (commit fa7b4bf)
tony-davis Jan 11, 2026
ceaa37b
[rocm-libraries] ROCm/rocm-libraries#3605 (commit 6d83ace)
tony-davis Jan 12, 2026
da059e8
[rocm-libraries] ROCm/rocm-libraries#3776 (commit c563897)
TorreZuk Jan 13, 2026
967744e
[rocm-libraries] ROCm/rocm-libraries#3573 (commit d6301fe)
TorreZuk Jan 15, 2026
ccb21f0
[rocm-libraries] ROCm/rocm-libraries#3928 (commit c9f8197)
TorreZuk Jan 19, 2026
624e12d
[rocm-libraries] ROCm/rocm-libraries#3951 (commit 279c7d6)
TorreZuk Jan 21, 2026
8a96e9e
[rocm-libraries] ROCm/rocm-libraries#3880 (commit e298dbd)
TorreZuk Jan 27, 2026
9549c27
[rocm-libraries] ROCm/rocm-libraries#4104 (commit 9d4f804)
amcamd Jan 27, 2026
a2a1b77
[rocm-libraries] ROCm/rocm-libraries#4096 (commit d9a826a)
tony-davis Jan 28, 2026
d1f3f68
[rocm-libraries] ROCm/rocm-libraries#4132 (commit ce28743)
TorreZuk Jan 30, 2026
ca00f1b
[rocm-libraries] ROCm/rocm-libraries#3889 (commit 72ca0a2)
TorreZuk Feb 2, 2026
811cc71
[rocm-libraries] ROCm/rocm-libraries#4198 (commit 1663ac0)
TorreZuk Feb 5, 2026
a900436
[rocm-libraries] ROCm/rocm-libraries#4353 (commit 16b3b18)
TorreZuk Feb 6, 2026
fa6dd33
[rocm-libraries] ROCm/rocm-libraries#4628 (commit 106635c)
TorreZuk Feb 20, 2026
8bb1ce2
[rocm-libraries] ROCm/rocm-libraries#4528 (commit ddfe3a8)
mahmoodw Feb 24, 2026
3abd463
[rocm-libraries] ROCm/rocm-libraries#4781 (commit 51c8fc4)
TorreZuk Mar 2, 2026
a0dd4cd
[rocm-libraries] ROCm/rocm-libraries#5024 (commit 35111f0)
TorreZuk Mar 3, 2026
ddbc377
[rocm-libraries] ROCm/rocm-libraries#5029 (commit cd4c348)
TorreZuk Mar 3, 2026
fa0d251
[rocm-libraries] ROCm/rocm-libraries#5098 (commit f4f2509)
amd-jnovotny Mar 4, 2026
11db6d5
[rocm-libraries] ROCm/rocm-libraries#4986 (commit 22840c0)
TorreZuk Mar 5, 2026
821ca39
[rocm-libraries] ROCm/rocm-libraries#4572 (commit 10e97bb)
davidd-amd Mar 6, 2026
db0ba02
[rocm-libraries] ROCm/rocm-libraries#5208 (commit e72f2cf)
linuxrocks123 Mar 13, 2026
36626be
[rocm-libraries] ROCm/rocm-libraries#5474 (commit 7157e61)
TorreZuk Mar 17, 2026
1d6586d
[rocm-libraries] ROCm/rocm-libraries#5412 (commit 1f1c69c)
TorreZuk Mar 17, 2026
3a31bc5
[rocm-libraries] ROCm/rocm-libraries#5167 (commit 1054ad1)
TorreZuk Mar 17, 2026
2a1448c
[rocm-libraries] ROCm/rocm-libraries#5554 (commit b5c7d1a)
TorreZuk Mar 19, 2026
1a89078
[rocm-libraries] ROCm/rocm-libraries#5585 (commit 406f575)
TorreZuk Mar 19, 2026
8b9132d
[rocm-libraries] ROCm/rocm-libraries#5606 (commit e836ee4)
amd-songpiao Mar 20, 2026
34f41ed
[rocm-libraries] ROCm/rocm-libraries#4659 (commit e0f642e)
tony-davis Mar 20, 2026
7c1d4b2
[rocm-libraries] ROCm/rocm-libraries#5831 (commit 1592523)
dependabot[bot] Mar 25, 2026
b7e0067
[rocm-libraries] ROCm/rocm-libraries#6008 (commit 051ffdc)
dependabot[bot] Mar 30, 2026
4cbe976
[rocm-libraries] ROCm/rocm-libraries#5869 (commit 456a2d5)
TorreZuk Mar 31, 2026
bc55688
[rocm-libraries] ROCm/rocm-libraries#4213 (commit 3ea8988)
actinks Apr 6, 2026
229e547
[rocm-libraries] ROCm/rocm-libraries#5976 (commit 4280825)
davidd-amd Apr 6, 2026
e8efba5
[rocm-libraries] ROCm/rocm-libraries#6144 (commit d478651)
TorreZuk Apr 7, 2026
5cc07af
[rocm-libraries] ROCm/rocm-libraries#5944 (commit ef00e8b)
tony-davis Apr 7, 2026
1e99dcf
[rocm-libraries] ROCm/rocm-libraries#6272 (commit d99b7b9)
evedovelli Apr 10, 2026
13ddab3
[rocm-libraries] ROCm/rocm-libraries#6353 (commit c470f5c)
tony-davis Apr 11, 2026
fda6e40
[rocm-libraries] ROCm/rocm-libraries#6398 (commit f2cdae4)
TorreZuk Apr 14, 2026
6cc0fb4
[rocm-libraries] ROCm/rocm-libraries#6391 (commit 4db13f7)
TorreZuk Apr 14, 2026
4fc1fcf
[rocm-libraries] ROCm/rocm-libraries#6422 (commit 771914c)
TorreZuk Apr 14, 2026
6b1c4c1
[rocm-libraries] ROCm/rocm-libraries#6361 (commit 4f4359f)
tony-davis Apr 20, 2026
f619ff4
[rocm-libraries] ROCm/rocm-libraries#5678 (commit 4a14fe2)
tony-davis Apr 21, 2026
bad852c
[rocm-libraries] ROCm/rocm-libraries#5282 (commit 8843cff)
harkgill-amd Apr 21, 2026
153fef5
[rocm-libraries] ROCm/rocm-libraries#6545 (commit 024fcb6)
TorreZuk Apr 21, 2026
f41d567
[rocm-libraries] ROCm/rocm-libraries#6742 (commit 96f552a)
TorreZuk Apr 24, 2026
c0c4aa1
[rocm-libraries] ROCm/rocm-libraries#6743 (commit 56b4451)
TorreZuk Apr 27, 2026
91e656c
[rocm-libraries] ROCm/rocm-libraries#6782 (commit d41d0df)
amd-jnovotny Apr 27, 2026
b4199c6
[rocm-libraries] ROCm/rocm-libraries#6869 (commit 0ee0ca5)
TorreZuk Apr 28, 2026
65824e2
[rocm-libraries] ROCm/rocm-libraries#6901 (commit 749afa0)
TorreZuk Apr 29, 2026
6a68b9e
[rocm-libraries] ROCm/rocm-libraries#7034 (commit 43d06c6)
evedovelli May 5, 2026
24a973a
[rocm-libraries] ROCm/rocm-libraries#7184 (commit 5f2edcb)
TorreZuk May 13, 2026
c150022
[rocm-libraries] ROCm/rocm-libraries#6854 (commit dab1ca3)
evedovelli May 15, 2026
20dc22a
[rocm-libraries] ROCm/rocm-libraries#7057 (commit 3ec7aa7)
TorreZuk May 15, 2026
07be072
rocBLAS: SWMMAC full precision family + new datatypes
May 17, 2026
4c06bd6
rocBLAS: routing + type enum for full SWMMAC precision family
May 17, 2026
ba46f19
rocBLAS: add UE8M0 datatype (e8m0_r=173) for OCP MX block-scale
May 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
425 changes: 425 additions & 0 deletions .cursor/rules/cpp-style.mdc

Large diffs are not rendered by default.

74 changes: 74 additions & 0 deletions .cursor/rules/rocblas-architecture.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
alwaysApply: true
---

# rocBLAS Project Architecture

> **Note to AI Agents:** If you notice these rules don't match the actual codebase or could be improved, please suggest updates. See `.cursorrules` for details.

rocBLAS is the AMD ROCm Basic Linear Algebra Subprograms (BLAS) library, implemented in HIP and optimized for AMD GPUs.

## Key Documentation

- **[README](./README.md)** - Project overview and requirements
- **[Linux Install Guide](./docs/install/Linux_Install_Guide.rst)** - Linux build and installation
- **[Windows Install Guide](./docs/install/Windows_Install_Guide.rst)** - Windows build and installation
- **[Programmer's Guide](./docs/how-to/Programmers_Guide.rst)** - API usage and programming guide
- **[Design Notes](./docs/conceptual/rocblas-design-notes.rst)** - Architecture and design decisions

## Component Architecture

| Component | Location | Purpose |
|-----------|----------|---------|
| **Library** (`library/`) | Core implementation | BLAS operations (Level 1, 2, 3, Extensions) |
| **Clients** (`clients/`) | Testing & benchmarking | Test suite (gtest), benchmarks, samples |
| **Tensile** (`library/src/blas3/`) | GEMM kernels | Optimized matrix multiplication kernels |
| **Dependencies** (`deps/`) | External deps | GTest, LAPACK dependencies |
| **Scripts** (`scripts/`) | Utilities | Performance testing, YAML generation |

## BLAS Operation Levels

### Level 1: Vector-Vector Operations
- Location: `library/src/blas1/`
- Examples: `axpy`, `dot`, `scal`, `nrm2`
- Characteristics: O(n) complexity, vector inputs/outputs

### Level 2: Matrix-Vector Operations
- Location: `library/src/blas2/`
- Examples: `gemv`, `ger`, `trmv`, `symv`
- Characteristics: O(n²) complexity, matrix-vector operations

### Level 3: Matrix-Matrix Operations
- Location: `library/src/blas3/`
- Examples: `gemm`, `trmm`, `symm`, `syrk`
- Characteristics: O(n³) complexity, highly optimized via Tensile
- **Tensile Integration:** Generates optimized GPU kernels for GEMM operations

### BLAS Extensions
- Location: `library/src/blas_ex/`
- Examples: `gemm_ex`, `gemm_strided_batched_ex`
- Purpose: Extended precision, batched operations, mixed precision

## Tensile Integration

Tensile is a kernel generator that creates highly optimized GEMM (General Matrix Multiply) kernels:

- **Configuration:** YAML files in `library/src/blas3/Tensile/Logic/`
- **Build-Time Generation:** Kernels generated during build process
- **Architecture-Specific:** Optimized for specific GPU architectures (gfx90a, gfx942, etc.)
- **Performance Critical:** GEMM is the most performance-critical BLAS operation

## Data Type Support

rocBLAS supports multiple data types with naming conventions:

- **s** - Single precision float (float)
- **d** - Double precision float (double)
- **c** - Complex single precision (rocblas_float_complex)
- **z** - Complex double precision (rocblas_double_complex)
- **h** - Half precision (rocblas_half)
- **bf16** - Brain float 16 (rocblas_bfloat16)
- **i8** - 8-bit integer (int8_t)
- **i32** - 32-bit integer (int32_t)

Example: `rocblas_sgemm` = single precision GEMM, `rocblas_daxpy` = double precision AXPY
297 changes: 297 additions & 0 deletions .cursor/rules/testing.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,297 @@
---
globs: ["*.cpp", "*.hpp", "*.h", "clients/gtest/**", "clients/benchmarks/**"]
---

# Testing Guidelines

> **Note to AI Agents:** If testing patterns have evolved or examples are misleading, suggest updates to these guidelines.

## Framework

- Use Google Test (gtest) framework for all C/C++ tests
- Test files located in `clients/gtest/`
- YAML test configurations in `clients/gtest/*.yaml`
- Use `TEST()` or `TEST_P()` macros for parameterized tests
- Never generate a main() function in test files - gtest provides its own

## Test Organization

### Test File Structure

**Note on API Usage:** Tests use `rocblas_gemm<T>` which is a **test infrastructure wrapper** that internally dispatches to precision-specific functions (rocblas_sgemm, rocblas_dgemm, etc.). End users of the rocBLAS library should call the precision-specific functions directly (e.g., `rocblas_sgemm` for float, `rocblas_dgemm` for double).

```cpp
/* ************************************************************************
* Copyright (C) 2016-2024 Advanced Micro Devices, Inc. All rights reserved.
* ... (full copyright header)
* ************************************************************************ */

#include "rocblas_test.hpp"
#include <gtest/gtest.h>

// Template test function using test infrastructure
template <typename T>
void testing_gemm(const Arguments& arg)
{
// Test infrastructure provides a templated wrapper around precision-specific APIs
// rocblas_gemm<T> internally calls rocblas_sgemm, rocblas_dgemm, etc.
auto rocblas_gemm_fn = arg.api & c_API_FORTRAN ? rocblas_gemm<T, true> : rocblas_gemm<T, false>;

rocblas_local_handle handle{arg};
rocblas_int m = arg.M;
rocblas_int n = arg.N;
rocblas_int k = arg.K;
rocblas_int lda = arg.lda;
rocblas_int ldb = arg.ldb;
rocblas_int ldc = arg.ldc;

T h_alpha = arg.get_alpha<T>();
T h_beta = arg.get_beta<T>();

rocblas_operation transA = rocblas_operation_none;
rocblas_operation transB = rocblas_operation_none;

// Allocate host and device memory using test infrastructure
HOST_MEMCHECK(host_matrix<T>, hA, (m, k, lda));
HOST_MEMCHECK(host_matrix<T>, hB, (k, n, ldb));
HOST_MEMCHECK(host_matrix<T>, hC, (m, n, ldc));

DEVICE_MEMCHECK(device_matrix<T>, dA, (m, k, lda));
DEVICE_MEMCHECK(device_matrix<T>, dB, (k, n, ldb));
DEVICE_MEMCHECK(device_matrix<T>, dC, (m, n, ldc));

// Initialize data on host
rocblas_init_matrix(hA, arg, rocblas_client_general_matrix);
rocblas_init_matrix(hB, arg, rocblas_client_general_matrix);
rocblas_init_matrix(hC, arg, rocblas_client_general_matrix);

// Copy data from CPU to device
CHECK_HIP_ERROR(dA.transfer_from(hA));
CHECK_HIP_ERROR(dB.transfer_from(hB));
CHECK_HIP_ERROR(dC.transfer_from(hC));

// Execute operation via test wrapper
CHECK_ROCBLAS_ERROR(rocblas_set_pointer_mode(handle, rocblas_pointer_mode_host));
DAPI_CHECK(rocblas_gemm_fn,
(handle, transA, transB, m, n, k, &h_alpha, dA, lda, dB, ldb, &h_beta, dC, ldc));

// Verify results (unit_check, norm_check, etc.)
if(arg.unit_check || arg.norm_check)
{
// Validation code here
}
}

// Test instantiation via YAML or direct arguments
TEST(gemm_gtest, float)
{
Arguments arg;
arg.M = 128;
arg.N = 128;
arg.K = 128;
arg.lda = 128;
arg.ldb = 128;
arg.ldc = 128;
testing_gemm<float>(arg);
}
```

## Test Naming Conventions

### Test Suite Naming

Format: `<operation>_<variant>_gtest`

Examples:
- `gemm_gtest` - Basic GEMM tests
- `gemm_strided_batched_gtest` - Strided batched GEMM tests
- `axpy_gtest` - AXPY tests

### Test Case Naming

Format: `<datatype>_<variant>`

Examples:
- `float` - Single precision test
- `double` - Double precision test
- `complex_float` - Complex single precision
- `half_precision` - Half precision (fp16)

### Test Categories (via gtest filters)

- `*quick*` - Fast tests for rapid iteration (< 1 minute)
- `*pre_checkin*` - Tests for PR validation (< 1 hour)
- `*nightly*` - Comprehensive regression tests (1-2 hours)
- `*known_bug*` - Tests for known issues (excluded from runs)

## YAML Test Configuration

YAML files define test parameters for comprehensive testing:

```yaml
---
include: rocblas_common.yaml
---

Tests:
- name: gemm_small
category: quick
function: gemm
precision: *single_double_precisions
transA: N
transB: N
M: [128, 256]
N: [128, 256]
K: [128, 256]
alpha: 1.0
beta: 0.0
lda: [128, 256]
ldb: [128, 256]
ldc: [128, 256]
```

## Test Execution Patterns

### Using rtest.py

```bash
# Quick smoke tests
python3 rtest.py -t smoke

# Pre-submit tests (PR validation)
python3 rtest.py -t psdb

# Nightly regression
python3 rtest.py -t osdb

# Complete quality engineering
python3 rtest.py -t cqe
```

### Direct Test Execution

```bash
# Run all tests
./build/release/clients/staging/rocblas-test

# Run specific operation
./build/release/clients/staging/rocblas-test --gtest_filter=*gemm*

# Run quick tests only
./build/release/clients/staging/rocblas-test --gtest_filter=*quick*

# Run with YAML
./build/release/clients/staging/rocblas-test --yaml clients/gtest/rocblas_smoke.yaml

# Exclude known bugs
./build/release/clients/staging/rocblas-test --gtest_filter=*quick*-*known_bug*
```

## Verification Patterns

### Numerical Verification

```cpp
// Helper to get epsilon for real and complex types (from clients/include/unit.hpp)
template <typename T, std::enable_if_t<(!rocblas_is_complex<T>), int> = 0>
constexpr double get_epsilon()
{
return std::numeric_limits<T>::epsilon();
}

template <typename T, std::enable_if_t<(rocblas_is_complex<T>), int> = 0>
constexpr auto get_epsilon()
{
// For complex types, get epsilon of the underlying real type
return get_epsilon<decltype(std::real(T{}))>();
}

// Compare against reference implementation (works for real and complex types)
template <typename T>
void verify_gemm_result(const T* computed,
const T* reference,
rocblas_int m,
rocblas_int n,
rocblas_int ldc)
{
double tolerance = get_epsilon<T>() * 10;

for(rocblas_int i = 0; i < m; i++)
{
for(rocblas_int j = 0; j < n; j++)
{
auto idx = i + j * ldc;
// std::abs returns magnitude for both real and complex types
auto diff = std::abs(computed[idx] - reference[idx]);
EXPECT_NEAR(diff, 0.0, tolerance);
}
}
}
```

### Error Handling Tests

```cpp
TEST(gemm_gtest, invalid_handle)
{
rocblas_status status = rocblas_sgemm(nullptr, ...);
EXPECT_EQ(status, rocblas_status_invalid_handle);
}

TEST(gemm_gtest, invalid_size)
{
rocblas_handle handle;
rocblas_create_handle(&handle);

rocblas_status status = rocblas_sgemm(handle,
rocblas_operation_none,
rocblas_operation_none,
-1, // invalid m
128,
128,
...);
EXPECT_EQ(status, rocblas_status_invalid_size);

rocblas_destroy_handle(handle);
}
```

## Benchmarking

### Using rocblas-bench

```bash
# GEMM benchmark
./build/release/clients/staging/rocblas-bench -f gemm -r f32_r -m 4096 -n 4096 -k 4096

# GEMV benchmark
./build/release/clients/staging/rocblas-bench -f gemv -r f32_r -m 8192 -n 8192

# Batched operations
./build/release/clients/staging/rocblas-bench -f gemm_strided_batched -r f32_r -m 256 -n 256 -k 256 --batch_count 100

# Load from YAML
./build/release/clients/staging/rocblas-bench --yaml scripts/performance/gemm_nn.yaml
```

## Environment Variables for Testing

```bash
# Enable verbose logging
export ROCBLAS_LAYER=1

# Enable numerical checks (1-4, higher = more checks)
export ROCBLAS_CHECK_NUMERICS=4

# Enable profiling
export ROCBLAS_LAYER=4
```

## Best Practices

1. **Test Coverage:** Cover normal cases, edge cases, and error conditions
2. **Data Types:** Test all supported precisions (s, d, c, z, h, bf16)
3. **Sizes:** Test small, medium, large, and edge case sizes
4. **Batching:** Test both single and batched operations where applicable
5. **Reference Implementation:** Use AOCL-BLAS or OpenBLAS for verification
6. **Performance:** Use `rocblas-bench` for performance regression testing
Loading