Investigation on NVIDIA Tensor cores v.s. AMD Matrix cores

What are formats for matrix multiplication in tensor cores and Matrix cores?
Is there a way or algorithm which can translate wmma in CUDA to Matrix cores operation?
How to design the experiments to test the numerical computation ability of tensor cores and matrix cores?

Methodology	Vender	Mapping Vender Methodology
notes/Warp Matrix Functions	NVIDIA	rocWMMA
rocWMMA	AMD	Warp Matrix Functions
PTX wmma/mma instruction	NVIDIA	MFMA compiler intrinsic
MFMA compiler intrinsic	AMD	PTX wmma/mma instruction
cuBLAS	NVIDIA	rocBLAS
rocBLAS	AMD	cuBLAS

	Inputs	outputs
NVIDIA Tensor cores	fp16/bf16/tf32/fp64	fp32/fp16/bf16/fp64
AMD Matrix Cores	fp16/bf16/fp32/fp64	fp32/fp16/bf16/fp64

Level 2 or 3
UTK test cases, mixed precision testing on GPUs