TensorFlow: setup
Configurazione attuale
- Scheda Video NVidia GeForce GTX 1050 Ti
- Ubuntu 18.04
- Java: assente
- Nvidia drivers: assente
- gcc 7.5
Steps
Requisiti Cuda 11 li ho presi per buoni anche per Cuda 10.1.
- Verifico versione di Ubuntu
1 2
~$ lsb_release -d Description: Ubuntu 18.04.4 LTS
- Verifico versione kernel di Ubuntu
1 2
~$ uname -r 4.15.0-112-generic
- Verifico scheda video se adatta a CUDA
1 2
~$ sudo lshw -C display product: GP107 GeForce GTX 1050 Ti
- Verifico Java (nessun messaggio di output)
1
~$ java --version
- Non ho java installato, quindi lo installo
Le due versioni principali sono la 8 e la 11, istallo quella più recente1 2
~$ sudo apt update ~$ sudo apt install openjdk-11-jdk
- Verifico Java (non so se sarà un problema la 11 invece della 8)
1 2 3 4
~$ java --version openjdk 11.0.7 2020-04-14 OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04) OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04, mixed mode, sharing)
- Verifico che gcc sia installato (non so se sarà un problema la 7.5 invece della 7.4 come requisito)
1 2
~$ gcc –version gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Nvidia Drivers
- Aggiungo repository grafici
1 2 3
~$ sudo add-apt-repository ppa:graphics-drivers/ppa ~$ sudo apt update ~$ sudo apt upgrade
- Driver disponibili
1 2 3 4 5 6 7 8 9 10 11 12
~$ ubuntu-drivers devices == /sys/devices/pci0000:00/0000:00:03.1/0000:1c:00.0 == modalias : pci:v000010DEd00001C82sv00001458sd00003732bc03sc00i00 vendor : NVIDIA Corporation model : GP107 [GeForce GTX 1050 Ti] driver : nvidia-driver-410 - third-party free driver : nvidia-driver-440 - distro non-free driver : nvidia-driver-435 - distro non-free driver : nvidia-driver-390 - distro non-free driver : nvidia-driver-415 - third-party free driver : nvidia-driver-450 - third-party free recommended driver : xserver-xorg-video-nouveau - distro free builtin
- Installo l’ultima versione (Versione 450, 931MB)
1
~$ sudo ubuntu-drivers autoinstall
- Resetto il PC
1
~$ sudo reboot
- Verifico driver Nvidia installati (utile anche per monitorare risorse GPU)
1
~$ nvidia-smi
CUDA
- Installa CUDA dependencies
Me ne ero dimenticato e le ho installate dopo, infatti il Summary dell’installazione di cuda mi ha avvisato ‘missing recommended libraries’1 2
~$ sudo apt install freeglut3-dev libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev ~$ sudo apt install g++ build-essential # non li ho installati
- Installo CUDA
Tensorflow 2.2 supporta cuda 10.1, non superiore, pesa circa 2.4GB.
Download dal sito Nvidia, necessita della registrazione il portale developer
Apparirà un messaggio che avvisa che i driver Nvidia sono già installati, è suffiente continuare ma dopo bisogna rimuovere dall’elenco che propone l’installazione dei Nvidia drivers (es. 418.87.00).1 2 3 4 5 6 7
~$ cd Downloads ~$ wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run ~$ sudo sh cuda_10.1.243_418.87.00_linux.run Existing package manager installation of the driver found. It is strongly recommended that you remove this before continuing. Abort Continue
..Continue
..Accept
..unmark Driver
..Install1 2 3 4 5 6 7 8 9 10 11 12 13
Summary Driver: Not Selected Toolkit: Installed in /usr/local/cuda-10.1/ Samples: Installed in /home/user/, but missing recommended libraries Please make sure that PATH includes /usr/local/cuda-10.1/bin LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.1/doc/pdf for detailed information on setting up CUDA. WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 418.00 is required for CUDA 10.1 functionality to work. To install the driver using this installer, run the following command, replacing CudaInstaller with the name of this run file: sudo CudaInstaller.run --silent --driver Logfile is /var/log/cuda-installer.log
- Cuda Path
Ho l’impressione che sia inutile perché così sono temporanei. Ho sempre avuto estremo fastidio ad impostare le variabili d’ambiente con Ubuntu perché sembrano esserci 3 file diversi in cui si potrebbero impostare e nella storia delle versioni di ubuntu hanno spesso cambiato nome/percorso.
Mi assicuro che il path in cui stia cuda sia quello corretto.1 2 3 4
~$ ls /usr/local/cuda-10.1/ ~$ export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}} ~$ export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} ~$ echo $PATH
- Cuda Path Permanent
Non l’ho provato e personalmente eviterei. Se rompi il bashrc è una rottura di scatole, un giorno imparerò a manipolarlo.. forse1 2
~$ echo "export PATH=/usr/local/cuda-10.1/bin:$PATH" >> ~/.bashrc ~$ echo "export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH" >> ~/.bashrc
- Cuda test example
1 2 3
~$ cd ~/NVIDIA_CUDA-10.1_Samples/5_Simulations/nbody ~$ make ~$ ./nbody
- Cuda version
nb. se da errore è perché bisogna richiamare nuovamente la variabile d’ambiente1 2 3 4 5
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243
cuDNN
- Installa cuDNN
Download cuDNN dal sito Nvidia, necessita della registrazione il portale developer
Download cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.1
libcudnn7_7.6.5.32-1%2Bcuda10.1_amd64.deb (Runtime Library)
libcudnn7-dev_7.6.5.32-1%2Bcuda10.1_amd64.deb (Developer Library)
libcudnn7-doc_7.6.5.32-1%2Bcuda10.1_amd64.deb (Code Samples)1 2 3 4
~$ cd Downloads/ ~$ sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb ~$ sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb ~$ sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.1_amd64.deb
- Resetto il PC
1
~$ sudo reboot
- Verify Cuda Installation
1 2 3
~$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery ~$ sudo make ~$ ./deviceQuery
1 2 3
~$ cd /usr/local/cuda/samples/1_Utilities/bandwidthTest ~$ sudo make ~$ ./bandwidthTest
1 2 3
~$ cd /usr/src/cudnn_samples_v7/mnistCUDNN/ ~$ sudo make clean && sudo make ~$ ./mnistCUDNN
1 2 3
~$ cd /usr/src/cudnn_samples_v7/conv_sample/ ~$ sudo make clean && sudo make ~$ ./conv_sample
Python
- Costruisco un conda environment apposito per TensorFlow
1 2 3 4 5
~$ conda-env list base * /home/user/miniconda3 py3 /home/user/miniconda3/envs/py3 ~$ conda create -n py3_tf --clone py3 ~$ conda activate py3_tf
- Installo TensorFlow
1 2 3
~$ pip install --upgrade pip ~$ pip install --upgrade tensorflow Downloading tensorflow-2.2.0-cp37-cp37m-manylinux2010_x86_64.whl (516.2 MB)
- Verifico pre-installazione (1/2) (l’ho lanciato prima di fare qualsiasi tipo di setup)
1 2 3 4 5 6 7 8 9 10
~$ python -c "import tensorflow as tf; x = [[2.]]; print('Tensorflow Version ', tf.__version__); print('hello TF world, {}'.format(tf.matmul(x, x)))" Tensorflow Version 2.2.0 2020-07-23 00:23:46.566744: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2020-07-23 00:23:46.566765: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303) 2020-07-23 00:23:46.566786: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (unknown): /proc/driver/nvidia/version does not exist 2020-07-23 00:23:46.567045: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-07-23 00:23:46.591079: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3199620000 Hz 2020-07-23 00:23:46.591771: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe094000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-07-23 00:23:46.591789: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version hello TF world, [[4.]]
- Verifico pre-installazione (2/2)
1 2 3 4
if tf.test.gpu_device_name(): print('Default GPU Device:{}'.format(tf.test.gpu_device_name())) else: print("Please install GPU version of TF")
1
Please install GPU version of TF
- Verifico post-installazione (1/2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
~$ python -c "import tensorflow as tf; x = [[2.]]; print('Tensorflow Version ', tf.__version__); print('hello TF world, {}'.format(tf.matmul(x, x)))" Tensorflow Version 2.2.0 2020-07-23 23:51:18.168952: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2020-07-23 23:51:18.223363: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-07-23 23:51:18.223705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:1c:00.0 name: GeForce GTX 1050 Ti computeCapability: 6.1 coreClock: 1.43GHz coreCount: 6 deviceMemorySize: 3.94GiB deviceMemoryBandwidth: 104.43GiB/s 2020-07-23 23:51:18.226765: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-07-23 23:51:18.291938: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2020-07-23 23:51:18.323948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2020-07-23 23:51:18.334741: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2020-07-23 23:51:18.409372: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2020-07-23 23:51:18.418972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2020-07-23 23:51:18.521344: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-07-23 23:51:18.521648: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero ..removed some prints 2020-07-23 23:51:18.629002: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-07-23 23:51:18.629643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3349 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:1c:00.0, compute capability: 6.1) 2020-07-23 23:51:18.637862: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 hello TF world, [[4.]]
- Verifico post-installazione (2/2)
1 2 3 4
if tf.test.gpu_device_name(): print('Default GPU Device:{}'.format(tf.test.gpu_device_name())) else: print("Please install GPU version of TF")
1
Default GPU Device:/device:GPU:0
Link utili
Install-cuda-10-and-cudnn-on-ubuntu-18
How-To-Install-CUDA-10-1-on-Ubuntu-19-04
Dubbi
- Il fatto che debba limitare la memoria della CPU in modo forzato comporta una perdita di performace?
- Usare il OpenJDK 11 invece del 8 può dare problemi?
- Usare il compilatore gcc 7.5 invece del 7.4 può dare problemi?
- Perché se ho installato Cuda 10.1 il comando ‘nvidia-smi’ mi restituisce Cuda Version: 11.0