If TensorFlow doesn’t detect your GPU, it will default to the CPU, which means when doing heavy training jobs, these will take a really long time to complete. This is most likely because the CUDA and CuDNN drivers are not being correctly detected in your system.
I am assuming that you have already installed Tensorflow with GPU support. If you haven’t check this article:
To check that GPU support is enabled, run the following from a terminal:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
If your GPU is detected you should see something similar to this output:
But if you are unlucky, then you will instead get the following output:
Or you might get an obscure error like the below:
2022-05-24 20:29:24.352218: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (100) 2022-05-24 20:29:24.352261: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (c37259b3e9a1): /proc/driver/nvidia/version does not exist
In both cases, Tensorflow is not detecting your Nvidia GPU. This can be for a variety of reasons:
- Nvidia Driver not installed
- CUDA not installed, or incompatible version
- CuDNN not installed or incompatible version
- Tensorflow running on Docker but without Nvidia drivers installed in host, or Nvidia Docker not installed
Now that you are sure that Tensorflow is not detecting your GPU, it’s time to install Tensorflow correctly. Check the article below: