Investigation on NVIDIA Tensor cores v.s. AMD Matrix cores
Questions want to solve
- What are formats for matrix multiplication in tensor cores and Matrix cores?
- Is there a way or algorithm which can translate
wmmain CUDA to Matrix cores operation? - How to design the experiments to test the numerical computation ability of tensor cores and matrix cores?
Related doc
notes/AMD matrix cores
notes/NVIDIA tensor cores
Matrix Multiplication Background
Manipulate Comparison
| Methodology | Vender | Description | Mapping Vender Methodology |
|---|---|---|---|
| notes/Warp Matrix Functions | NVIDIA | rocWMMA | |
| rocWMMA | AMD | Warp Matrix Functions | |
| PTX wmma/mma instruction | NVIDIA | MFMA compiler intrinsic | |
| MFMA compiler intrinsic | AMD | PTX wmma/mma instruction | |
| cuBLAS | NVIDIA | rocBLAS | |
| rocBLAS | AMD | cuBLAS | |
Supported formats
| Inputs | outputs | |
|---|---|---|
| NVIDIA Tensor cores | fp16/bf16/tf32/fp64 | fp32/fp16/bf16/fp64 |
| AMD Matrix Cores | fp16/bf16/fp32/fp64 | fp32/fp16/bf16/fp64 |
Level 2 or 3
UTK test cases, mixed precision testing on GPUs