Investigation on NVIDIA Tensor cores v.s. AMD Matrix cores
Questions want to solve
- What are formats for matrix multiplication in tensor cores and Matrix cores?
- Is there a way or algorithm which can translate
wmma
in CUDA to Matrix cores operation? - How to design the experiments to test the numerical computation ability of tensor cores and matrix cores?
Related doc
notes/AMD matrix cores
notes/NVIDIA tensor cores
Matrix Multiplication Background
Manipulate Comparison
Methodology | Vender | Description | Mapping Vender Methodology |
---|---|---|---|
notes/Warp Matrix Functions | NVIDIA | rocWMMA | |
rocWMMA | AMD | Warp Matrix Functions | |
PTX wmma/mma instruction | NVIDIA | MFMA compiler intrinsic | |
MFMA compiler intrinsic | AMD | PTX wmma/mma instruction | |
cuBLAS | NVIDIA | rocBLAS | |
rocBLAS | AMD | cuBLAS | |
Supported formats
Inputs | outputs | |
---|---|---|
NVIDIA Tensor cores | fp16/bf16/tf32/fp64 | fp32/fp16/bf16/fp64 |
AMD Matrix Cores | fp16/bf16/fp32/fp64 | fp32/fp16/bf16/fp64 |
Level 2 or 3
UTK test cases, mixed precision testing on GPUs