SIGSEGV in torch.linalg.inv

🐛 Bug

Various inputs for test_inverse_cpu_* (in test_linalg.py) are resulting in seg faults. The issue is seen for all the datatypes tested. I’m only seeing the seg faults when using torch.linalg.inv (i.e. torch.inverse is fine).

I’m also seeing seg faults in the following tests in test_ops.py:

  • test_out_linalg_inv_cpu_*
  • test_variant_consistency_eager_linalg_inv_cpu_*
  • test_variant_consistency_jit_linalg_inv_cpu_*
  • test_fn_grad_linalg_inv_cpu_*
  • test_fn_gradgrad_linalg_inv_cpu_*
  • test_supported_dtypes_linalg_inv_cpu_*

I’ve not fully tested, but the linalg_inv makes me suspect that these are been caused by the same issue.

To Reproduce

Steps to reproduce the behavior:

import torch
n = 0
batches = []
a = random_fullrank_matrix_distinct_singular_value(n, *batches, dtype=torch.float32).to('cpu')
torch.linalg.inv(a)

random_fullrank_matrix_distinct_singular_value being from

def random_fullrank_matrix_distinct_singular_value(matrix_size, *batch_dims,

Fails with n = 0 when batches is one of [], [1], [4], [2, 3].

Expected behavior

No seg fault.

Environment

PyTorch 1.8.0 release – built from source.

Collecting environment information...
PyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: CentOS Linux release 8.2.2004 (Core)  (x86_64)
GCC version: (GCC) 10.2.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.4
[pip3] torch==1.8.0
[conda] Could not collect

Build info

PYTORCH_BUILD_VERSION=1.8.0 PYTORCH_BUILD_NUMBER=1 MAX_JOBS=40 BLAS=Eigen USE_FFMPEG=1 BUILD_CUSTOM_PROTOBUF=0 USE_IBVERBS=1 USE_CUDA=0 USE_METAL=0   /rds/bear-apps/devel/eb-sjb-up/EL8/EL8-cas/software/Python/3.8.6-GCCcore-10.2.0/bin/python setup.py build
  • GCC: 10.2.0
  • OpenBLAS: 0.3.12
  • FFTW: 3.3.8
  • CMake: 3.18.4
  • Extra C/CXX flags: -O2 -ftree-vectorize -march=native -fno-math-errno

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @VitalyFedyunin

1 possible answer(s) on “SIGSEGV in torch.linalg.inv

  1. The problem was that one of the arguments to the LAPACK call was not correct for 0x0 matrices. I submitted a fix for that.
    Even though our tests are failing because of 0x0 test cases, PyTorch v1.8.0 compiled with OpenBLAS is expected to work correctly for non-empty inputs.

    The BLAS=Eigen option is misleading and it affects only the Caffe2 code and is not used in ATen.

    if(BLAS STREQUAL “Eigen”)
    # Eigen is header-only and we do not have any dependent libraries
    set(CAFFE2_USE_EIGEN_FOR_BLAS ON)

    The following message is printed with BLAS=Eigen:

    -- Trying to find preferred BLAS backend of choice: Eigen
    CMake Warning at cmake/Dependencies.cmake:175 (message):
      Preferred BLAS (Eigen) cannot be found, now searching for a general BLAS
      library
    

    Then BLAS is searched with find_package(BLAS).

    if(NOT (ATLAS_FOUND OR OpenBLAS_FOUND OR MKL_FOUND OR VECLIB_FOUND OR GENERIC_BLAS_FOUND))
    message(WARNING “Preferred BLAS (“ ${BLAS} “) cannot be found, now searching for a general BLAS library”)
    find_package(BLAS)

    CMake log is needed to know what BLAS and LAPACK libraries are picked actually. Or at least print(torch.__config__.show()), it should contain the line Build settings: BLAS_INFO=....

    I assume OpenBLAS is picked up. I compiled with OpenBLAS 0.3.12 and I see the segfault for 0x0 input. torch.inverse is fine because it explicitly returns for this case without relying on the LAPACK library to do that.