Summarize NaN issues

Define NaN

IEEE - Special values (NaN, INF, subnormal)
NaN is a floating-point representation to represent a value which has no meaning. Specifically, NaN's exponent is the maximum value for this datatype and mantissa is not 0.

NaN's Generation

How NaN generates

Example and analysis

Which NaN, INF matters
pytorch -- nan
- More mixed precision issue related to NaN in APEX
Search NaN in deepstability
sru
cumf

Analyze NaN in matrix multiplication

MM's implementation is like a blocked FMA, so the case producing NaN for FMA will also produce NaN in MM:

fma(+-0, +-inf, z) = NaN
fma(+-inf, +-0, z) = NaN
fma(x,y,-inf) = NaN if x*y=inf
fma(x,y,inf) = NaN if x*y=-inf

That's why it will produce NaN if you are doing precision conversion in the inputs; precision conversion is easily to get +-inf which will cause NaN in the following operations.