# Difference between revisions of "Eosc/easybuild/benchmark"

(→TensorFlow) |
(→NumPy) |
||

Line 12: | Line 12: | ||

= NumPy = | = NumPy = | ||

We use [https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/numpy/numpy_test.py this Python script] which runs multiple random matrix multiplications and singular value decompositions (SVD). | We use [https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/numpy/numpy_test.py this Python script] which runs multiple random matrix multiplications and singular value decompositions (SVD). | ||

− | Only CPU is employed. | + | Only CPU is employed (Numpy 1.18.1). |

− | + | To reproduce: after installing the NLPL software stack, load the '''NLPL-numpy''' module and run the commands below: | |

== OpenBLAS on Saga == | == OpenBLAS on Saga == | ||

Line 21: | Line 21: | ||

Multiplication took 78 seconds. | Multiplication took 78 seconds. | ||

SVD took 60 seconds.</nowiki> | SVD took 60 seconds.</nowiki> | ||

+ | |||

+ | [https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/numpy/test_numpy_openblas.slurm SLURM job script] | ||

+ | |||

+ | [https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/results/numpy/numpy-1-18-1-openblas.out Detailed SLURM log file] | ||

== IMKL on Saga == | == IMKL on Saga == | ||

Line 27: | Line 31: | ||

Multiplication took 55 seconds. | Multiplication took 55 seconds. | ||

SVD took 49 seconds.</nowiki> | SVD took 49 seconds.</nowiki> | ||

+ | |||

+ | [https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/numpy/test_numpy_mkl.slurm SLURM job script] | ||

+ | |||

+ | [https://source.coderefinery.org/nlpl/easybuild/-/blob/ak-dev/tests/results/numpy/numpy-1.18.1-mkl.out Detailed SLURM log file] | ||

= TensorFlow = | = TensorFlow = |

## Revision as of 16:13, 27 November 2020

## Contents

# Background

In the context of the EOSC-Nordic EasyBuild pilot, the following page provides instructions for how to benchmark different software configurations on typical problems that are likely to affect NLPL users. Relevant variation will typically contrast pre-compiled binary installations (e.g. `pip` wheels) vs. locally compiled modules, where architecture-specific optimizations are enabled and optimized libraries (e.g. Intel MKL) are used.

# NumPy

We use this Python script which runs multiple random matrix multiplications and singular value decompositions (SVD). Only CPU is employed (Numpy 1.18.1).

To reproduce: after installing the NLPL software stack, load the **NLPL-numpy** module and run the commands below:

## OpenBLAS on Saga

$ module load NLPL-numpy/1.18.1-foss-2019b-Python-3.7.4 $ python3 tests/numpy/numpy_test.py Multiplication took 78 seconds. SVD took 60 seconds.

## IMKL on Saga

$ module load NLPL-numpy/1.18.1-gomkl-2019b-Python-3.7.4 $ python3 tests/numpy/numpy_test.py Multiplication took 55 seconds. SVD took 49 seconds.

# TensorFlow

We use training a toy BERT model on 4 GPUs as a benchmark (TensorFlow 1.15.2, SLURM job script).

To reproduce: after installing the NLPL software stack, load the **NLPL-nvidia_BERT** module (see below) and run the following command:

$ train_bert.sh CORPUS VOCAB CONFIG

where CORPUS is a path to a directory with text files, VOCAB is a path to a WordPiece vocabulary, CONFIG is a path to a BERT configuration JSON (defining the model hyperparameters).

Ready-to-use toy data for Norwegian can be downloaded here, but in principle any plain text corpus can be fed to this code.

## TF with OpenBLAS on Saga

module load NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4 $ train_bert.sh no_wiki/ norwegian_wordpiece_vocab_20k.txt norbert_config.json

Training time: **00:46:27**

## TF with OpenMKL on Saga

module load NLPL-nvidia_BERT/20.06.8-gomkl-2019b-TensorFlow-1.15.2-Python-3.7.4 $ train_bert.sh no_wiki/ norwegian_wordpiece_vocab_20k.txt norbert_config.json

Training time: **00:46:19**