How to setup Tensorflow on Ubuntu Linux with multiple GPUs using Docker

For this tutorial, I will be setting up the latest version of Tensorflow, currently 2.9 with GPU support on a workstation kindly provided by SabrePC with 4 x RTX 3080.

For those curious, here is a clip of what this workstation looks like:

Setting up Tensorflow typically requires a lot of hard work as you need to get the combination of NVIDIA CUDA drivers, cuDNN, Python version, and Tensorflow in perfect alignment.

You might be tempted to deviate in one of the versions, for instance, CUDA, but you will pay for that mistake dearly. It might not be today, but down the line, you will get some really random error in Tensorflow that will waste your time with hours of debugging. 

Then you will remember that you thought it was ok to use CUDA 11.3 instead of 11.2. Or perhaps you were feeling luckier, and you tried an even later version of CUDA. Newer is better right?

Setting up Tensorflow with Docker

I have gone through the whole manual setup of Tensorflow once before, not for Linux but for Windows. If you are curious about the level of effort required, I recommend you to read that article, to get a sense of the level of effort required:

With Docker, setting up Tensorflow with GPU support on a Linux environment is extremely easy. Also, it has the added advantage that you quickly change from one version of Tensorflow to another, at the drop of a hat. 

Pre-requisites

The first thing you need to do is to make sure that your Linux distribution has the latest Nvidia drivers installed.

Installing Docker

To install Docker follow the instructions here for Ubuntu Linux.

Install Nvidia Docker

You need to install Nvidia Docker, also known as the NVIDA Container Toolkit to allow docker to leverage your Nvidia GPU.

Installation instructions can be found here.

What Tensorflow docker image to choose

There is more than one docker image to choose from with Tensorflow. It really depends on what you are looking for.

If you require Tensorflow to have GPU support, then you should pick a Tensorflow image with tag -gpu as part of the tag, and if you would like to have a Jupyter notebook then you should also look for an image with -Jupiter in the tag name.
So for instance if I want a Tensorflow image with support for a GPU and a Jupyter notebook then with docker we could do:

docker run -it -p 8888:8888 -v my_tf_notebooks:/tf/my_tf_notebooks tensorflow/tensorflow:latest-gpu-jupyter
[I 16:31:45.875 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
jupyter_http_over_ws extension initialized. Listening on /http_over_websocket
[I 16:31:45.969 NotebookApp] Serving notebooks from local directory: /tf
[I 16:31:45.969 NotebookApp] Jupyter Notebook 6.4.11 is running at:
[I 16:31:45.969 NotebookApp] http://1ed623a5c161:8888/?token=2d3293719cd7f1f430fe00e0cc05bc0609d255ffcd495f9c
[I 16:31:45.969 NotebookApp]  or http://127.0.0.1:8888/?token=2d3293719cd7f1f430fe00e0cc05bc0609d255ffcd495f9c
[I 16:31:45.969 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:31:45.970 NotebookApp]

To access the notebook, open this file in a browser:
        file:////.local/share/jupyter/runtime/nbserver-1-open.html
    Or copy and paste one of these URLs:
        http://1ed623a5c161:8888/?token=2d3293719cd7f1f430fe00e0cc05bc0609d255ffcd495f9c
     or http://127.0.0.1:8888/?token=2d3293719cd7f1f430fe00e0cc05bc0609d255ffcd495f9c
  • -v maps a folder in your computer to a folder in the docker container. You will most likely need this as you don’t want to save any python code in any other directory as these will be lost as soon as the docker container exits.
  • -p maps port 8888 in your workstation to port 8888 running inside the docker container. This option is needed so you can access Jupyter Notebooks via a local url

Checking that Tensorflow is able to access the GPU

To quickly check that Tensorflow is able to detect our GPUs let’s create a Python 3 Jupyter Notebook and run the following in the first cell:

import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
[PhysicalDevice(name='/physical_device:GPU:0',device_type='GPU'),PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'),PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'),PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]

It looks good. Tensorflow is able to detect our GPUs!

Extending the Tensorflow Docker Image

But what if you wanted to install additional tools that you may need for your particular project?

In that case, you should extend the default Tensorflow image by creating your own Dockerfile with the following content:

FROM tensorflow/tensorflow:latest-gpu-jupyter

RUN pip install jupyterlab

You can build it locally by doing:

docker build -t codemental/tensorflow .
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM tensorflow/tensorflow:latest-gpu-jupyter
---> 7053c34a3929
Step 2/2 : RUN pip install jupyterlab
---> Running in 1265f6c35854
Collecting jupyterlab
Downloading jupyterlab-3.4.2-py3-none-any.whl (8.8 MB)
..
Successfully installed anyio-3.6.1 babel-2.10.1 json5-0.9.8 jupyter-server-1.17.0 jupyterlab-3.4.2 jupyterlab-server-2.14.0 nbclassic-0.3.7 nbformat-5.4.0 notebook-shim-0.1.0 pytz-2022.1 sniffio-1.2.0 websocket-client-1.3.2

And to run it we can do:

docker run -it -p 8888:8888 -v my_tf_notebooks:/tf/my_tf_notebooks codemental/tensorflow

In the example above we are always taking the latest version of Tensorflow, which is built overnight. 

If you are working on a real project, that is a bad idea as will you be working with an always changing version of Tensorflow. Therefore we should pick a stable version of Tensorflow by searching for the correct tag. 

As of today, the latest version of Tensorflow is version 2.9, which means that we could change the Dockerfile to:

FROM tensorflow/tensorflow:2.9.0-gpu-jupyter
RUN pip install jupyterlab

The good thing about specifying a tag, it means that upgrading Tensorflow is under our control.

To conclude this tutorial, with Docker it’s so easy to set up Tensorflow with GPU support that you may be wondering why would anyone want to install all the Tensorflow packages by hand?

Resources

Recommended Courses for Data Science