The cuda version of torch.det is much slower than cpu version, why?

I test torch.det function with the following script:

`
import torch
import time

t = torch.randn(4, 784, 4, 3, 3)
tic_start = time.time()
torch.det(t)
print(“cpu: “, time.time() – tic_start)

t = t.cuda()
tic_start = time.time()
torch.det(t)
print(“cuda: “, time.time() – tic_start)

`

and get the following output:
cpu: 0.01205134391784668
cuda: 1.553579330444336

I am using a TITAN X GPU.
Why the det calculation on cuda is much slower than it on cpu?

1 possible answer(s) on “The cuda version of torch.det is much slower than cpu version, why?

  1. It’s correct that the first cuda run is slower because it includes Magma library initialization. The script in the second comment is not performing correct synchronizations either. Please use Timer utility to properly benchmark code, see README and simple_timeit in the examples. You’d need a recent pytorch build for that. With proper timing, cuda det is much faster than cpu det.