Difference between revisions of "Eosc/easybuild/modules"
(→"Regular" modules) |
(→NLPL virtual laboratory) |
||
Line 2: | Line 2: | ||
The laboratory is a reproducible custom-built set of NLP software. | The laboratory is a reproducible custom-built set of NLP software. | ||
− | It is currently installed on Saga and Puhti HPC clusters. | + | It is currently installed on ''Saga'', ''Fox'', and ''Puhti'' HPC clusters. |
− | To use on Saga: run the following command on (can be put in the ''~/.bashrc'' file to be run automatically at login): | + | To use on ''Saga'': run the following command on (can be put in the ''~/.bashrc'' file to be run automatically at login): |
''module use -a /cluster/shared/nlpl/software/eb/etc/all/'' | ''module use -a /cluster/shared/nlpl/software/eb/etc/all/'' | ||
+ | |||
+ | To use on ''Fox'': run the following command on (can be put in the ''~/.bashrc'' file to be run automatically at login): | ||
+ | |||
+ | ''module use -a /fp/projects01/ec30/software/easybuild/modules/all/'' | ||
After that, the "nlpl"-branded modules will be available via ''module avail'', ''module load'', etc. | After that, the "nlpl"-branded modules will be available via ''module avail'', ''module load'', etc. | ||
Line 18: | Line 22: | ||
Note that the modules which have "gomkl" in their names are built using | Note that the modules which have "gomkl" in their names are built using | ||
Intel Math Kernel Library, making them significantly faster in CPU tasks | Intel Math Kernel Library, making them significantly faster in CPU tasks | ||
− | with Intel processors. | + | with Intel processors (for example, on ''Saga''). |
+ | |||
+ | Those with "foss" instead of "gomkl" are CPU-agnostic and will run on machines with AMD CPUs (for example, on ''Fox''). | ||
− | + | Further on, we just use the placeholder '''ARCH''', replace it with either "gomkl" or "foss", depending on which machine you are working | |
=== "Bundle" modules === | === "Bundle" modules === | ||
Line 27: | Line 33: | ||
Here are the details: | Here are the details: | ||
− | * '''nlpl-python-candy/2021.01- | + | * '''nlpl-python-candy/2021.01-ARCH-2019b-Python-3.7.4''': various utility packages not directly related to NLP |
** tqdm 4.62.3 | ** tqdm 4.62.3 | ||
** pydot 1.4.2 | ** pydot 1.4.2 | ||
Line 38: | Line 44: | ||
** jsonlines 2.0.0 | ** jsonlines 2.0.0 | ||
** typing_extensions 3.7.4.3 | ** typing_extensions 3.7.4.3 | ||
− | * '''nlpl-nlptools/2021.01- | + | * '''nlpl-nlptools/2021.01-ARCH-2019b-Python-3.7.4''': various utility packages related to NLP |
** conllu 4.4.1 | ** conllu 4.4.1 | ||
** seqeval 1.2.2 | ** seqeval 1.2.2 | ||
** langdetect 1.0.9 | ** langdetect 1.0.9 | ||
** leven 1.0.4 | ** leven 1.0.4 | ||
− | * '''nlpl-scipy-ecosystem/2021.01- | + | * '''nlpl-scipy-ecosystem/2021.01-ARCH-2019b-Python-3.7.4''': everything that constitutes the [https://scipy.org/ SciPy ecosystem]. Too many packages to enumerate them all, but the most important are: |
** scipy 1.5.4 | ** scipy 1.5.4 | ||
** pandas 1.2.1 | ** pandas 1.2.1 | ||
Line 56: | Line 62: | ||
These are more obvious modules, each one gives you one software piece: | These are more obvious modules, each one gives you one software piece: | ||
− | * '''nlpl-cython/0.29.21- | + | * '''nlpl-cython/0.29.21-ARCH-2019b-Python-3.7.4''': [http://cython.org/ Cython] 0.29.21 |
− | * '''nlpl-dllogger/0.1.0- | + | * '''nlpl-dllogger/0.1.0-ARCH-2019b-Python-3.7.4''': [https://github.com/NVIDIA/dllogger DLLogger] 0.1.0 |
− | * '''nlpl-gensim/3.8.3- | + | * '''nlpl-gensim/3.8.3-ARCH-2019b-Python-3.7.4''': [https://github.com/RaRe-Technologies/gensim Gensim] 3.8.3 |
− | * '''nlpl-horovod/0.20.3- | + | * '''nlpl-horovod/0.20.3-ARCH-2019b-tensorflow-1.15.2-Python-3.7.4''': [https://github.com/horovod/horovod Horovod] 0.20.3 |
− | * '''nlpl-nltk/3.5- | + | * '''nlpl-nltk/3.5-ARCH-2019b-Python-3.7.4''': [https://www.nltk.org/ NLTK] 3.5, together with '''all''' the corpora and datasets (no need to download them separately!) |
− | * '''nlpl-numpy/1.18.1- | + | * '''nlpl-numpy/1.18.1-ARCH-2019b-Python-3.7.4''': [https://numpy.org/ NumPy] 1.18.1 |
− | * '''nlpl-nvidia-bert/20.06.8- | + | * '''nlpl-nvidia-bert/20.06.8-ARCH-2019b-tensorflow-1.15.2-Python-3.7.4''': [https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT NVIDIA's BERT implementation] for TensorFlow 1 |
− | * '''nlpl-pytorch/1.6.0- | + | * '''nlpl-pytorch/1.6.0-ARCH-2019b-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.6.0 |
− | * '''nlpl-pytorch/1.7.1- | + | * '''nlpl-pytorch/1.7.1-ARCH-2019b-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 |
− | * '''nlpl-scikit-bundle/0.22.2.post1- | + | * '''nlpl-scikit-bundle/0.22.2.post1-ARCH-2019b-Python-3.7.4''': [https://scikit-learn.org/ Scikit-Learn] 0.22.2 |
− | * '''nlpl-simple_elmo/0.9.0- | + | * '''nlpl-simple_elmo/0.9.0-ARCH-2019b-Python-3.7.4''': [https://pypi.org/project/simple-elmo/ Simple_elmo] 0.9.0 |
− | * '''nlpl-stanza/1.1.1- | + | * '''nlpl-stanza/1.1.1-ARCH-2019b-Python-3.7.4''': [https://stanfordnlp.github.io/stanza/ Stanza] 1.1.1 |
− | * '''nlpl-tensorflow/1.15.2- | + | * '''nlpl-tensorflow/1.15.2-ARCH-2019b-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 1.15.2 |
− | * '''nlpl-tensorflow/2.3.2- | + | * '''nlpl-tensorflow/2.3.2-ARCH-2019b-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.3.2 |
− | * '''nlpl-tokenizers/0.10.2- | + | * '''nlpl-tokenizers/0.10.2-ARCH-2019b-Python-3.7.4''': [https://github.com/huggingface/tokenizers HuggingFace Tokenizers] 0.10.2 |
− | * '''nlpl-transformers/4.5.1- | + | * '''nlpl-transformers/4.5.1-ARCH-2019b-Python-3.7.4''': [https://huggingface.co/transformers/ HuggingFace Transformers] 4.5.1 |
− | * '''nlpl-transformers/4.14.1- | + | * '''nlpl-transformers/4.14.1-ARCH-2019b-Python-3.7.4''': [https://huggingface.co/transformers/ HuggingFace Transformers] 4.14.1 |
− | * '''nlpl-wandb/0.12.6- | + | * '''nlpl-wandb/0.12.6-ARCH-2019b-Python-3.7.4''': [https://pypi.org/project/wandb/ Weights and Biases (wandb)] 0.12.6 |
− | * '''sentencepiece/0.1.94- | + | * '''sentencepiece/0.1.94-ARCH-2019b-Python-3.7.4''': [https://github.com/google/sentencepiece SentencePiece] 0.1.94 |
− | * '''sentencepiece/0.1.96- | + | * '''sentencepiece/0.1.96-ARCH-2019b-Python-3.7.4''': [https://github.com/google/sentencepiece SentencePiece] 0.1.96 |
= Source = | = Source = | ||
Currently, the virtual laboratory is generated using EasyBuild, all the code and easyconfigs available [https://source.coderefinery.org/nlpl/easybuild here]. | Currently, the virtual laboratory is generated using EasyBuild, all the code and easyconfigs available [https://source.coderefinery.org/nlpl/easybuild here]. |
Revision as of 01:45, 26 January 2022
Contents
NLPL virtual laboratory
The laboratory is a reproducible custom-built set of NLP software. It is currently installed on Saga, Fox, and Puhti HPC clusters.
To use on Saga: run the following command on (can be put in the ~/.bashrc file to be run automatically at login):
module use -a /cluster/shared/nlpl/software/eb/etc/all/
To use on Fox: run the following command on (can be put in the ~/.bashrc file to be run automatically at login):
module use -a /fp/projects01/ec30/software/easybuild/modules/all/
After that, the "nlpl"-branded modules will be available via module avail, module load, etc.
It is highly recommended to use them, instead of installing a copy in one's own home directory.
List of modules
From time to time, updated modules with newer software versions will be added, but the older modules will never be removed (for reproducibility).
Note that the modules which have "gomkl" in their names are built using Intel Math Kernel Library, making them significantly faster in CPU tasks with Intel processors (for example, on Saga).
Those with "foss" instead of "gomkl" are CPU-agnostic and will run on machines with AMD CPUs (for example, on Fox).
Further on, we just use the placeholder ARCH, replace it with either "gomkl" or "foss", depending on which machine you are working
"Bundle" modules
These are the modules with the most cryptic names. Each of them contains a bunch of software pieces (Python packages, as a rule). Many of these modules will be loaded automatically as dependencies of the regular modules, but sometimes they can be useful themselves. Here are the details:
- nlpl-python-candy/2021.01-ARCH-2019b-Python-3.7.4: various utility packages not directly related to NLP
- tqdm 4.62.3
- pydot 1.4.2
- smart_open 5.2.1
- cached-property 1.5.2
- filelock 3.0.12
- regex 2021.10.23
- sacremoses 0.0.46
- mpi4py 3.1.1
- jsonlines 2.0.0
- typing_extensions 3.7.4.3
- nlpl-nlptools/2021.01-ARCH-2019b-Python-3.7.4: various utility packages related to NLP
- conllu 4.4.1
- seqeval 1.2.2
- langdetect 1.0.9
- leven 1.0.4
- nlpl-scipy-ecosystem/2021.01-ARCH-2019b-Python-3.7.4: everything that constitutes the SciPy ecosystem. Too many packages to enumerate them all, but the most important are:
- scipy 1.5.4
- pandas 1.2.1
- matplotlib 3.1.2
- ipython 7.11.1
- jupyter_core 4.6.1
- jupyter_client 5.3.4
- networkx 2.5.1
- sympy 1.7.1
"Regular" modules
These are more obvious modules, each one gives you one software piece:
- nlpl-cython/0.29.21-ARCH-2019b-Python-3.7.4: Cython 0.29.21
- nlpl-dllogger/0.1.0-ARCH-2019b-Python-3.7.4: DLLogger 0.1.0
- nlpl-gensim/3.8.3-ARCH-2019b-Python-3.7.4: Gensim 3.8.3
- nlpl-horovod/0.20.3-ARCH-2019b-tensorflow-1.15.2-Python-3.7.4: Horovod 0.20.3
- nlpl-nltk/3.5-ARCH-2019b-Python-3.7.4: NLTK 3.5, together with all the corpora and datasets (no need to download them separately!)
- nlpl-numpy/1.18.1-ARCH-2019b-Python-3.7.4: NumPy 1.18.1
- nlpl-nvidia-bert/20.06.8-ARCH-2019b-tensorflow-1.15.2-Python-3.7.4: NVIDIA's BERT implementation for TensorFlow 1
- nlpl-pytorch/1.6.0-ARCH-2019b-cuda-10.1.243-Python-3.7.4: PyTorch 1.6.0
- nlpl-pytorch/1.7.1-ARCH-2019b-cuda-10.1.243-Python-3.7.4: PyTorch 1.7.1
- nlpl-scikit-bundle/0.22.2.post1-ARCH-2019b-Python-3.7.4: Scikit-Learn 0.22.2
- nlpl-simple_elmo/0.9.0-ARCH-2019b-Python-3.7.4: Simple_elmo 0.9.0
- nlpl-stanza/1.1.1-ARCH-2019b-Python-3.7.4: Stanza 1.1.1
- nlpl-tensorflow/1.15.2-ARCH-2019b-cuda-10.1.243-Python-3.7.4: TensorFlow 1.15.2
- nlpl-tensorflow/2.3.2-ARCH-2019b-cuda-10.1.243-Python-3.7.4: TensorFlow 2.3.2
- nlpl-tokenizers/0.10.2-ARCH-2019b-Python-3.7.4: HuggingFace Tokenizers 0.10.2
- nlpl-transformers/4.5.1-ARCH-2019b-Python-3.7.4: HuggingFace Transformers 4.5.1
- nlpl-transformers/4.14.1-ARCH-2019b-Python-3.7.4: HuggingFace Transformers 4.14.1
- nlpl-wandb/0.12.6-ARCH-2019b-Python-3.7.4: Weights and Biases (wandb) 0.12.6
- sentencepiece/0.1.94-ARCH-2019b-Python-3.7.4: SentencePiece 0.1.94
- sentencepiece/0.1.96-ARCH-2019b-Python-3.7.4: SentencePiece 0.1.96
Source
Currently, the virtual laboratory is generated using EasyBuild, all the code and easyconfigs available here.