CUDA error when used torch.mm() with gpu in pytorch1.8.0

🐛 Bug

when used torch.mm(mat1, mat2) in pytorch 1.8.0, mat1 and mat2 both in gpu, there is a Runtime Error.
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling ‘cublasCreate(handle)’

To Reproduce

import torch
mat1 = torch.randn(2,3).to(0)
mat2 = torch.randn(3,3).to(0)

torch.mm(mat1, mat2)

Environment

  • PyTorch Version : 1.8.0
  • OS (e.g., Linux): Centos7
  • How you installed PyTorch (conda, pip, source): pip install
  • Python version: 3.7.6
  • CUDA/cuDNN version: cuda 10.2

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @ngimel @aocsa @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk

1 possible answer(s) on “CUDA error when used torch.mm() with gpu in pytorch1.8.0

  1. I can reproduce the problem using torch-1.8 on RTX 2080 (sm_75) for the following trivial case:

    $ python -c "import torch;x=torch.eye(3, 3, device='cuda');print(torch.mm(x,x))"
    

    torch-1.8 (unlike 1.7) is shipped without sm_75 cubins due to the size considerations:

    Analyzing /home/nshulga/.local/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so
    .nv_fatbin size 705.5MiB
      sm_37: 72.5MiB
      sm_50: 171.1MiB
      sm_60: 179.7MiB
      sm_70: 193.0MiB
      sm_35: 39.0MiB
      sm_61: 50.2MiB
    __nv_relfatbin size 35.4MiB
      sm_35: 5.3MiB
      sm_50: 7.4MiB
      sm_60: 7.8MiB
      sm_70: 14.9MiB
      sm_37: 54.5KiB