cdist gradient computation is broken

🐛 Bug

Gradient computation of torch.cdist results in non-deterministic behavior in some simple cases. It appears there is some kind of overflow or similar going on.

To Reproduce

Steps to reproduce the behavior:

def test_grad():
    x1 = torch.tensor([[0.]], requires_grad=True)
    x2 = torch.tensor([[0.5], [1.0]], requires_grad=True)
    res = torch.cdist(x1, x2)
    res[0, 0].backward()
    print(x1.grad, x2.grad)

This produces the following (depending on the setting (pytorch, OS) this happens from almost every time to only occasionally).

>>> test_grad()
tensor([[-1.]]) tensor([[1.0000e+00], [1.4013e-45]])

>>> test_grad()
tensor([[-1.]]) tensor([[1.], [0.]])

>>> test_grad()
tensor([[-1.]]) tensor([[1.0000e+00], [1.1285e+07]])

Expected behavior

This should be deterministic and correct.

Environment

This happens on both pytorch 1.2.0 and on pytorch master, on OSX as well as on Linux.

Additional context

There are other issues with cdist that could be related (in particular the second one):
#25799
#24345

cc @ezyang @gchanan @zou3519

Author: Fantashit

2 thoughts on “cdist gradient computation is broken

  1. This happens when x1 has a size 1xn. Then the result and the incoming gradient have the size 1xm (where m is the number of vectors in x2), grad.t() is reported as contiguous, but it’s last stride is still m, and computing offsets based on the last stride here becomes incorrect

    for (const scalar_t * t2_curr = t2; t2_curr != t2_end; t2_curr += m, grad_k += gs, dist_k += 1) {

    (grad_k += gs).

  2. Pingback: cheapest hack

Comments are closed.