Build failure with “TORCH_CUDA_API” is undefined and more

🐛 Bug

Failing to build from source. Have built successfully some months ago (pytorch-20200514_bbfd0ef), but failing to build now. For the earlier successful build the OS packages were older, gcc was older, nvidia stack was older, pytorch was older.

To Reproduce

Steps to reproduce the behavior:

  1. git clone the source
  2. git submodule sync
  3. git submodule update –init –recursive
  4. Set env vars
  5. python3 setup.py install –root=/usr/local/src/pytorch/pkg/new

Build issue appears to start at this section of the build output:

[4860/5986] Building NVCC (Device) object ...src/THC/torch_cuda_generated_THCSleep.cu.
FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/torch_cuda_generated_THCSleep.cu.o 
cd /usr/local/src/pytorch/src/pytorch-git/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC && /usr/bin/cmake -E make_directory /usr/local/src/pytorch/src/pytorch-git/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/. && /usr/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Release -D generated_file:STRING=/usr/local/src/pytorch/src/pytorch-git/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/./torch_cuda_generated_THCSleep.cu.o -D generated_cubin_file:STRING=/usr/local/src/pytorch/src/pytorch-git/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/./torch_cuda_generated_THCSleep.cu.o.cubin.txt -P /usr/local/src/pytorch/src/pytorch-git/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/torch_cuda_generated_THCSleep.cu.o.Release.cmake
/usr/local/src/pytorch/src/pytorch-git/torch/include/THC/THCGeneral.h(39): error: identifier "TORCH_CUDA_API" is undefined

/usr/local/src/pytorch/src/pytorch-git/torch/include/THC/THCGeneral.h(39): error: "THCState" has already been declared in the current scope

/usr/local/src/pytorch/src/pytorch-git/torch/include/THC/THCGeneral.h(39): error: expected a ";"

Complete build output messages:
https://gist.githubusercontent.com/edrozenberg/6e2a25c76d7c62533204974bd4499a47/raw/4c7d3875063f186a8044afc15958c723e3f87732/pytorch%%2520build%%2520log%%25202021-02-16.txt

Expected behavior

Successfull build to the target dir

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

  • PyTorch Version (e.g., 1.0): 2020-02-16 52af23b
  • OS (e.g., Linux): Slackware Linux 64 -current (pre-15.0)
  • How you installed PyTorch (conda, pip, source): source
  • Build command you used (if compiling from source): python3 setup.py install –root=/usr/local/src/pytorch/pkg/new
  • Python version: 3.9.1
  • CUDA/cuDNN version: cuda-11.2.1 / cudnn-8.1.0.77_11.2
  • GPU models and configuration: TITAN X (Pascal) (12GB), GeForce GT 630 (2GB)
  • Any other relevant information: magma-2.5.4, nvidia-driver-460.39, nvidia-nccl-2.8.4.1_11.2

Additional context

Using the following build approach:

#!/usr/bin/bash

export TORCH_CUDA_ARCH_LIST="6.1;7.0;7.5;8.0;8.6"
export NCCL_INCLUDE_DIR="/opt/nvidia/nccl/include"
export NCCL_ROOT_DIR="/opt/nvidia/nccl"
export USE_SYSTEM_NCCL=1

cd pytorch-git

python3 setup.py install --root=/usr/local/src/pytorch/pkg/new

Built and installed magma from source

Linux 5.10.15 #1 SMP Wed Feb 10 14:06:55 CST 2021 x86_64 
Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz GenuineIntel GNU/Linux

magma-2.5.4

gcc-10.2.0
gcc-brig-10.2.0
gcc-g++-10.2.0
gcc-gdc-10.2.0
gcc-gfortran-10.2.0
gcc-gnat-10.2.0
gcc-go-10.2.0
gcc-objc-10.2.0
gccmakedep-1.0.3

automake-1.16.2
cmake-3.19.4
gccmakedep-1.0.3
imake-1.0.8
make-4.3
makedepend-1.0.6
pmake-1.111

nvidia-cuda-11.2.1
nvidia-cudnn-8.1.0.77_11.2
nvidia-driver-460.39
nvidia-kernel-460.39_5.10.15
nvidia-ml-py3-7.352.0
nvidia-nccl-2.8.4.1_11.2
nvidia-tensorrt-7.2.2.3_11.1

cc @malfet @seemethere @walterddr @ngimel

1 possible answer(s) on “Build failure with “TORCH_CUDA_API” is undefined and more

  1. Hi,

    Maybe related. I ran once into this issue when I had a different pytorch version in my PATH

    PATH=/third_party/libtorch/include 1.XXX <= From my install dir
    pytorch1.YYY/build$ make <= Current compile

    -DCMAKE_INSTALL_PREFIX=/usr/local/src/pytorch/src/pytorch-git/torch
    /usr/local/src/pytorch/src/pytorch-git/torch is your install dir.
    /usr/local/src/pytorch/src/pytorch-git/torch/include/THC/THCGeneral.h(39): error: identifier “TORCH_CUDA_API” is undefined

    make clean first or remove prev install.

    Pascal