Gradient computation of
torch.cdist results in non-deterministic behavior in some simple cases. It appears there is some kind of overflow or similar going on.
Steps to reproduce the behavior:
def test_grad(): x1 = torch.tensor([[0.]], requires_grad=True) x2 = torch.tensor([[0.5], [1.0]], requires_grad=True) res = torch.cdist(x1, x2) res[0, 0].backward() print(x1.grad, x2.grad)
This produces the following (depending on the setting (pytorch, OS) this happens from almost every time to only occasionally).
>>> test_grad() tensor([[-1.]]) tensor([[1.0000e+00], [1.4013e-45]]) >>> test_grad() tensor([[-1.]]) tensor([[1.], [0.]]) >>> test_grad() tensor([[-1.]]) tensor([[1.0000e+00], [1.1285e+07]])
This should be deterministic and correct.
This happens on both pytorch 1.2.0 and on pytorch master, on OSX as well as on Linux.