no kernel image is available for execution on the device

I’ve installed TensorFlow 2.3.0 on windows 10 but cant run any python scrips contain TensorFlow codes!
GPU card : NVIDIA 960m
OS : Windows 10
Cuda : 10.1
cudnn : 7.6.5.32
python: 3.7.7

this is log output when I run script :

py main.py
2020-08-18 21:19:56.898389: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-08-18 21:19:59.658108: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-08-18 21:20:00.194181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 74.65GiB/s
2020-08-18 21:20:00.207551: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-08-18 21:20:00.213582: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-08-18 21:20:00.218522: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-08-18 21:20:00.224225: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-08-18 21:20:00.236176: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-08-18 21:20:00.244819: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-08-18 21:20:00.251416: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-08-18 21:20:00.257794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-08-18 21:20:00.263340: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-18 21:20:00.286389: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2e0eb7cb660 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-18 21:20:00.297402: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-18 21:20:00.302999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 74.65GiB/s
2020-08-18 21:20:00.317067: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-08-18 21:20:00.323311: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-08-18 21:20:00.329936: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-08-18 21:20:00.335930: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-08-18 21:20:00.341258: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-08-18 21:20:00.348285: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-08-18 21:20:00.354092: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-08-18 21:20:00.360523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-08-18 21:20:00.439805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-18 21:20:00.447079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2020-08-18 21:20:00.452262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2020-08-18 21:20:00.456464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3121 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
2020-08-18 21:20:00.480959: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2e0eb7ca5e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-18 21:20:00.493404: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 960M, Compute Capability 5.0
2020-08-18 21:20:00.797182: F .\tensorflow/core/kernels/random_op_gpu.h:232] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: no kernel image is available for execution on the device

I’ve had this problem since I tried install TensorFlow version +2 , If I install TensorFlow 15.3.1 I wont have this problem !!!
but since TensorFlow 1+ wont have supported next year I want to install TensorFlow 2+.

1 possible answer(s) on “no kernel image is available for execution on the device

  1. Just wanted to mention that we get the same “no kernel image is available for execution on the device” error with TensorFlow 2.3.0 on our NVidia 940MX cards (which are also compute capability 5.0, similar to author’s 960M) in both precompiled python and c_api libraries. However, the same code is running fine on NVidia 1080Ti (compute capability 6.1) and 2080Ti (compute capability 7.5) cards.

    We figured that it might be related to this line in the 2.3.0 release notes:

    TF 2.3 includes PTX kernels only for compute capability 7.0 to reduce the TF pip binary size. Earlier releases included PTX for a variety of older compute capabilities.