Difference between revisions of "Eosc/easybuild/benchmark"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(TensorFlow)
(TF with OpenMKL on Saga)
 
(13 intermediate revisions by the same user not shown)
Line 12: Line 12:
 
= NumPy =
 
= NumPy =
 
We use [https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/numpy/numpy_test.py this Python script] which runs multiple random matrix multiplications and singular value decompositions (SVD).
 
We use [https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/numpy/numpy_test.py this Python script] which runs multiple random matrix multiplications and singular value decompositions (SVD).
Only CPU is employed.
+
Only CPU is employed (Numpy 1.18.1).
 +
 
 +
To reproduce: after installing the NLPL software stack, load the '''NLPL-numpy''' module and run the commands below:
  
 
== OpenBLAS on Saga ==
 
== OpenBLAS on Saga ==
  <nowiki>$ module use -a /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/modules/all/
+
  <nowiki>$ module load NLPL-numpy/1.18.1-foss-2019b-Python-3.7.4
$ module load numpy/1.18.1-foss-2019b-Python-3.7.4
 
 
$ python3 tests/numpy/numpy_test.py
 
$ python3 tests/numpy/numpy_test.py
Multiplication took 78 seconds.
+
Multiplication took 74 seconds.
SVD took 60 seconds.</nowiki>
+
SVD took 50 seconds.</nowiki>
 +
 
 +
[https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/numpy/test_numpy_openblas.slurm SLURM job script]
 +
 
 +
[https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/results/numpy/numpy-1-18-1-openblas.out Detailed SLURM log file]
  
 
== IMKL on Saga ==
 
== IMKL on Saga ==
  <nowiki>$ module use -a /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/modules/all/
+
  <nowiki>$ module load NLPL-numpy/1.18.1-gomkl-2019b-Python-3.7.4
$ module load NLPL-numpy/1.18.1-gomkl-2019b-Python-3.7.4
 
 
$ python3 tests/numpy/numpy_test.py
 
$ python3 tests/numpy/numpy_test.py
Multiplication took 55 seconds.
+
Multiplication took 52 seconds.
SVD took 49 seconds.</nowiki>
+
SVD took 40 seconds.</nowiki>
 +
 
 +
[https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/numpy/test_numpy_mkl.slurm SLURM job script]
 +
 
 +
[https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/results/numpy/numpy-1.18.1-mkl.out Detailed SLURM log file]
 +
 
 +
= TensorFlow on CPU =
 +
We use training a sentiment classifier with convolutional neural networks on CPU as a benchmark (TensorFlow 1.15.2, 20 epochs, [https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/text_data/tensorflow_cnn.slurm SLURM job script]).
 +
 
 +
To reproduce: after installing the NLPL software stack, load the '''NLPL-TensorFlow''' module (see below) and run the following command:
 +
 
 +
<nowiki>$ python3 tensorflow_text_cnn.py -t TRAIN -d DEV </nowiki>
 +
 
 +
where TRAIN is a path to a training dataset, and DEV is a path to a development dataset.
 +
 
 +
Ready-to-use toy data adapted from the [https://nlp.stanford.edu/sentiment/treebank.html SST dataset] can be downloaded [https://source.coderefinery.org/nlpl/easybuild/-/tree/ak-dev/tests/text_data/SST here].
 +
 
 +
== TF with OpenBLAS on Saga ==
 +
<nowiki>module load module load NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4
 +
$ python3 tensorflow_text_cnn.py -t SST/stanford_sentiment_binary_train.tsv.gz -d SST/stanford_sentiment_binary_dev.tsv.gz
 +
</nowiki>
 +
 
 +
Training time: '''55 seconds'''
 +
 
 +
[https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/results/CNN/tensorflow_cnn_openBLAS.out Detailed SLURM log file]
 +
 
 +
== TF with OpenMKL on Saga ==
  
= TensorFlow =
+
<nowiki>module load module load NLPL-TensorFlow/1.15.2-gomkl-2019b-Python-3.7.4
 +
$ python3 tensorflow_text_cnn.py -t SST/stanford_sentiment_binary_train.tsv.gz -d SST/stanford_sentiment_binary_dev.tsv.gz
 +
</nowiki>
  
We use [https://github.com/ltgoslo/simple_elmo_training training an ELMo model on 2 GPUs] as a benchmark.  
+
Training time: '''55 seconds'''
Any plain text corpus can be fed to this code.
+
 
 +
[https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/results/CNN/tensorflow_cnn_mkl.out Detailed SLURM log file]
 +
 
 +
= TensorFlow on GPU =
 +
 
 +
We use training a toy BERT model on 4 GPUs as a benchmark (TensorFlow 1.15.2, [https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/text_data/bert_all_in_one.slurm SLURM job script]).
 +
 
 +
To reproduce: after installing the NLPL software stack, load the '''NLPL-nvidia_BERT''' module (see below) and run the following command:
 +
 
 +
<nowiki>$ train_bert.sh CORPUS VOCAB CONFIG </nowiki>
 +
 
 +
where CORPUS is a path to a directory with text files, VOCAB is a path to a WordPiece vocabulary, CONFIG is a path to a BERT configuration JSON (defining the model hyperparameters).
 +
 
 +
Ready-to-use toy data for Norwegian can be downloaded [https://source.coderefinery.org/nlpl/easybuild/-/tree/ak-dev/tests/text_data/no_wiki here], but in principle any plain text corpus can be fed to this code.
  
 
== TF with OpenBLAS on Saga ==
 
== TF with OpenBLAS on Saga ==
A corpus of 59 579 878 word tokens. Vocabulary size 10 000. 1 epoch.
+
<nowiki>module load NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4
 +
$ train_bert.sh no_wiki/ norwegian_wordpiece_vocab_20k.txt norbert_config.json</nowiki>
 +
 
 +
Training time: '''00:46:27'''
  
Training time: '''01:49:15'''
+
[https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/results/bert/tensorflow_1.15.2_openBLAS_bert.out Detailed SLURM log file]
  
 
== TF with OpenMKL on Saga ==
 
== TF with OpenMKL on Saga ==
A corpus of 59 579 878 word tokens. Vocabulary size 10 000. 1 epoch.
 
  
Training time: '''01:48:21'''
+
<nowiki>module load NLPL-nvidia_BERT/20.06.8-gomkl-2019b-TensorFlow-1.15.2-Python-3.7.4
 +
$ train_bert.sh no_wiki/ norwegian_wordpiece_vocab_20k.txt norbert_config.json</nowiki>
 +
 
 +
Training time: '''00:46:19'''
 +
 
 +
[https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/results/bert/tensorflow_1.15.2_mkl_bert.out Detailed SLURM log file]

Latest revision as of 17:07, 29 November 2020

Background

In the context of the EOSC-Nordic EasyBuild pilot, the following page provides instructions for how to benchmark different software configurations on typical problems that are likely to affect NLPL users. Relevant variation will typically contrast pre-compiled binary installations (e.g. `pip` wheels) vs. locally compiled modules, where architecture-specific optimizations are enabled and optimized libraries (e.g. Intel MKL) are used.

NumPy

We use this Python script which runs multiple random matrix multiplications and singular value decompositions (SVD). Only CPU is employed (Numpy 1.18.1).

To reproduce: after installing the NLPL software stack, load the NLPL-numpy module and run the commands below:

OpenBLAS on Saga

$ module load NLPL-numpy/1.18.1-foss-2019b-Python-3.7.4
$ python3 tests/numpy/numpy_test.py
Multiplication took 74 seconds.
SVD took 50 seconds.

SLURM job script

Detailed SLURM log file

IMKL on Saga

$ module load NLPL-numpy/1.18.1-gomkl-2019b-Python-3.7.4
$ python3 tests/numpy/numpy_test.py
Multiplication took 52 seconds.
SVD took 40 seconds.

SLURM job script

Detailed SLURM log file

TensorFlow on CPU

We use training a sentiment classifier with convolutional neural networks on CPU as a benchmark (TensorFlow 1.15.2, 20 epochs, SLURM job script).

To reproduce: after installing the NLPL software stack, load the NLPL-TensorFlow module (see below) and run the following command:

$ python3 tensorflow_text_cnn.py -t TRAIN -d DEV 

where TRAIN is a path to a training dataset, and DEV is a path to a development dataset.

Ready-to-use toy data adapted from the SST dataset can be downloaded here.

TF with OpenBLAS on Saga

module load module load NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4
$ python3 tensorflow_text_cnn.py -t SST/stanford_sentiment_binary_train.tsv.gz -d SST/stanford_sentiment_binary_dev.tsv.gz 

Training time: 55 seconds

Detailed SLURM log file

TF with OpenMKL on Saga

module load module load NLPL-TensorFlow/1.15.2-gomkl-2019b-Python-3.7.4
$ python3 tensorflow_text_cnn.py -t SST/stanford_sentiment_binary_train.tsv.gz -d SST/stanford_sentiment_binary_dev.tsv.gz

Training time: 55 seconds

Detailed SLURM log file

TensorFlow on GPU

We use training a toy BERT model on 4 GPUs as a benchmark (TensorFlow 1.15.2, SLURM job script).

To reproduce: after installing the NLPL software stack, load the NLPL-nvidia_BERT module (see below) and run the following command:

$ train_bert.sh CORPUS VOCAB CONFIG 

where CORPUS is a path to a directory with text files, VOCAB is a path to a WordPiece vocabulary, CONFIG is a path to a BERT configuration JSON (defining the model hyperparameters).

Ready-to-use toy data for Norwegian can be downloaded here, but in principle any plain text corpus can be fed to this code.

TF with OpenBLAS on Saga

module load NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4
$ train_bert.sh no_wiki/ norwegian_wordpiece_vocab_20k.txt norbert_config.json

Training time: 00:46:27

Detailed SLURM log file

TF with OpenMKL on Saga

module load NLPL-nvidia_BERT/20.06.8-gomkl-2019b-TensorFlow-1.15.2-Python-3.7.4
$ train_bert.sh no_wiki/ norwegian_wordpiece_vocab_20k.txt norbert_config.json

Training time: 00:46:19

Detailed SLURM log file