The issue of compling from source code: undeclared inclusion(s) in rule ‘@nccl_archive//:nccl’

Hey guys,

I can compile the cpu version of tensorflow 1.2.0-rc2 without any problem, however, i am blocked when i try to compile the gpu version. I spend almost the whole day to install tensorflow 1.2.0-rc2 gpu version in our cluster, however there is no lucky.

Here is the error from the terminal:

ERROR: /home/hpc-xin/.cache/bazel/_bazel_hpc-xin/c5e302e32d7acb9c3e4fce8142d1cd66/external/nccl_archive/BUILD:33:1: undeclared inclusion(s) in rule ‘@nccl_archive//:nccl’:
this rule is missing dependency declarations for the following files included by ‘external/nccl_archive/src/libwrap.cu.cc’:
‘/gpfs/gpfs1/apps2/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include-fixed/limits.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include-fixed/syslimits.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/stddef.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/new’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/x86_64-unknown-linux-gnu/bits/c++config.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/x86_64-unknown-linux-gnu/bits/os_defines.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/x86_64-unknown-linux-gnu/bits/cpu_defines.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/exception’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/bits/atomic_lockfree_defines.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/bits/exception_ptr.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/bits/exception_defines.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/bits/nested_exception.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/stdarg.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/cmath’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/bits/cpp_type_traits.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/ext/type_traits.h’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/cstdlib’
‘/gpfs/gpfs1/apps2/gcc/5.4.0/include/c++/5.4.0/cstdio’.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use –verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 22.226s, Critical Path: 5.20s

OS Information

Red Hat Enterprise Linux Workstation release 6.7 (Santiago)

Modules I loaded

gcc/5.4.0, cuda/8.0.61, cudnn/6.0, java/1.8.0_31, sqlite/3.18.0, tcl/8.6.6.8606, python/3.6.1

Here is how i make the configuration

gcc (v5.4.0): /apps2/gcc/5.4.0 (customized path)
enabled cuda (v8.0): /apps2/cuda/8.0.61 (customized path)
enabled cudnn: /apps2/cudnn/6.0 (customized path)
python (v3.6.1, which is compiled with gcc 5.4.0): /apps2/python/3.6.1 (customized path)
Other options are disabled (such as hadoop, google cloud).

Here are some files i modified manually before compiling

(1) /tensorflow-1.2.0-rc2/third_party/gpus/crosstool/CROSSTOOL_clang.tpl
/tensorflow-1.2.0-rc2/third_party/gpus/crosstool/CROSSTOOL_nvcc.tpl
Add the following contents in all of “toolchain”:
cxx_builtin_include_directory: “/apps2/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include”
cxx_builtin_include_directory: “/apps2/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include-fixed”
cxx_builtin_include_directory: “/apps2/gcc/5.4.0/include/c++/5.4.0”
cxx_builtin_include_directory: “/apps2/gcc/5.4.0/include”
cxx_builtin_include_directory: “/apps2/cuda/8.0.61/include”
cxx_builtin_include_directory: “/apps2/cudnn/6.0/include”

(2) /home/hpc-xin/.cache/bazel/_bazel_hpc-xin/c5e302e32d7acb9c3e4fce8142d1cd66/external/protobuf/protobuf.bzl
Add the following content in “ctx.action”
env=ctx.configuration.default_shell_env

(3) /home/hpc-xin/.cache/bazel/_bazel_hpc-xin/c5e302e32d7acb9c3e4fce8142d1cd66/external/nccl_archive/Makefile
Some modification as below:
CUDA_HOME ?= /usr/local/cuda –> CUDA_HOME ?= /apps2/cuda/8.0.61

I noticed that some of people said this issue could be bypassed by delete the nccl dependence in “/tensorflow-1.2.0-rc2/tensorflow/contrib/BUILD”. However, i do not think this is the correct way to solve this issue. I wanna keep this dependence.

Thanks so much for your help. Any comments are appreciated.
Best Regards
Xin

2 thoughts on “The issue of compling from source code: undeclared inclusion(s) in rule ‘@nccl_archive//:nccl’

  1. I had this exact problem on CentOS 7.1, with gcc 4.8.3, bazel 0.4.5, CUDA 8.0, cuDNN 5.1. CUDA and cuDNN are installed in the default path (/usr/local/cuda).

    The error messages are as follows:

    ERROR: /home/user/.cache/bazel/_bazel_sankuai/a124f28b2f66957c03d4d409b9638361/external/nccl_archive/BUILD:33:1: undeclared inclusion(s) in rule '@nccl_archive//:nccl':
    this rule is missing dependency declarations for the following files included by 'external/nccl_archive/src/libwrap.cu.cc':
      '/usr/lib/gcc/x86_64-redhat-linux/4.8.3/include/limits.h'
      '/usr/lib/gcc/x86_64-redhat-linux/4.8.3/include/syslimits.h'
      '/usr/lib/gcc/x86_64-redhat-linux/4.8.3/include/stddef.h'
      '/usr/lib/gcc/x86_64-redhat-linux/4.8.3/include/stdarg.h'.
    [...]
    Target //tensorflow/tools/pip_package:build_pip_package failed to build
    Use --verbose_failures to see the command lines of failed build steps.
    INFO: Elapsed time: 38.362s, Critical Path: 28.04s
    

    Adding

    cxx_builtin_include_directory: "/usr/lib/gcc/x86_64-redhat-linux/4.8.3/include"
    

    to

    third_party/gpus/crosstool/CROSSTOOL_nvcc.tpl
    

    seems fixed the problem.

    @lxwgcool I noticed that the file path in your errors starts with /gpfs/gpfs1/, while your added path in CROSSTOOL_nvcc.tpl starts with /apps2, could this be the problem?

  2. you have to run bazel with this option: –cxxopt=”-I/usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/*.h”

Comments are closed.