Training Neural Networks with Mixed Precision

n.functional.mse_loss(y_pred, y) optimizer.zero_grad() loss.backward()optimizer.step()1
show annotation

Why zero_grad(). https://stackoverflow.com/questions/48001598/why-do-we-need-to-call-zero-grad-in-pytorch