Eosc/easybuild/benchmark
Contents
Background
In the context of the EOSC-Nordic EasyBuild pilot, the following page provides instructions for how to benchmark different software configurations on typical problems that are likely to affect NLPL users. Relevant variation will typically contrast pre-compiled binary installations (e.g. `pip` wheels) vs. locally compiled modules, where architecture-specific optimizations are enabled and optimized libraries (e.g. Intel MKL) are used.
NumPy
We use this Python script which runs multiple random matrix multiplications and singular value decompositions (SVD). Only CPU is employed (Numpy 1.18.1).
To reproduce: after installing the NLPL software stack, load the NLPL-numpy module and run the commands below:
OpenBLAS on Saga
$ module load NLPL-numpy/1.18.1-foss-2019b-Python-3.7.4 $ python3 tests/numpy/numpy_test.py Multiplication took 74 seconds. SVD took 50 seconds.
IMKL on Saga
$ module load NLPL-numpy/1.18.1-gomkl-2019b-Python-3.7.4 $ python3 tests/numpy/numpy_test.py Multiplication took 52 seconds. SVD took 40 seconds.
TensorFlow on CPU
We use training a sentiment classifier with convolutional neural networks on CPU as a benchmark (TensorFlow 1.15.2, 20 epochs, SLURM job script).
To reproduce: after installing the NLPL software stack, load the NLPL-TensorFlow module (see below) and run the following command:
$ python3 tensorflow_text_cnn.py -t TRAIN -d DEV
where TRAIN is a path to a training dataset, and DEV is a path to a development dataset.
Ready-to-use toy data adapted from the SST dataset can be downloaded here.
TF with OpenBLAS on Saga
module load module load NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4 $ python3 tensorflow_text_cnn.py -t SST/stanford_sentiment_binary_train.tsv.gz -d SST/stanford_sentiment_binary_dev.tsv.gz
Training time: 55 seconds
TF with OpenMKL on Saga
module load module load NLPL-TensorFlow/1.15.2-gomkl-2019b-Python-3.7.4 $ python3 tensorflow_text_cnn.py -t SST/stanford_sentiment_binary_train.tsv.gz -d SST/stanford_sentiment_binary_dev.tsv.gz
Training time: 55 seconds
TensorFlow on GPU
We use training a toy BERT model on 4 GPUs as a benchmark (TensorFlow 1.15.2, SLURM job script).
To reproduce: after installing the NLPL software stack, load the NLPL-nvidia_BERT module (see below) and run the following command:
$ train_bert.sh CORPUS VOCAB CONFIG
where CORPUS is a path to a directory with text files, VOCAB is a path to a WordPiece vocabulary, CONFIG is a path to a BERT configuration JSON (defining the model hyperparameters).
Ready-to-use toy data for Norwegian can be downloaded here, but in principle any plain text corpus can be fed to this code.
TF with OpenBLAS on Saga
module load NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4 $ train_bert.sh no_wiki/ norwegian_wordpiece_vocab_20k.txt norbert_config.json
Training time: 00:46:27
TF with OpenMKL on Saga
module load NLPL-nvidia_BERT/20.06.8-gomkl-2019b-TensorFlow-1.15.2-Python-3.7.4 $ train_bert.sh no_wiki/ norwegian_wordpiece_vocab_20k.txt norbert_config.json
Training time: 00:46:19