Difference between revisions of "Eosc/easybuild/modules"
(→Source) |
(→Others) |
||
(9 intermediate revisions by the same user not shown) | |||
Line 20: | Line 20: | ||
but the older modules will never be removed (for reproducibility). | but the older modules will never be removed (for reproducibility). | ||
− | Note that the modules which have "gomkl" in their names are built using | + | Note that the modules which have "gomkl" or "intel" in their names are built using |
− | Intel Math Kernel Library, making them | + | Intel Math Kernel Library, making them (somewhat) faster in CPU tasks |
− | with Intel processors (for example, on ''Saga''). | + | with Intel processors (for example, on ''Saga'', '''except ''a100'' partition''' which uses AMD CPUs). |
− | Those with "foss" | + | Those with "foss" in their names are CPU-agnostic and will run on machines with all kinds of CPUs (for example, on ''Fox'', but also on all ''Saga'' partitions). |
− | The next element in the module name after "foss" or " | + | The next element in the module name after "foss", "gomkl" or "intel" is the virtual laboratory (stack) version: for example, "2019b", "2021a" or "2022b". |
− | Modules from different stack versions are incompatible with each other: you cannot load a module from "foss-2019b" and a module from "foss- | + | Modules from different stack versions are incompatible with each other: you cannot load a module from "foss-2019b" and a module from "foss-2022b" simultaneously. |
− | Currently, " | + | '''Currently, "2022b" version is the recommended one, with the most up-to-date software built with the most up-to-date compilers.''' |
− | Further on, we just use the placeholder '''ARCH''', replace it with "gomkl" or "foss" and " | + | Further on, we just use the placeholder '''ARCH''', replace it with "gomkl", "intel" or "foss" and "2021a" or "2022b", depending on which machine you are working on and what stack version you want to use. |
− | Some modules also have the Python version specified in their names (for example, "''nlpl-numpy-1. | + | Some modules also have the Python version specified in their names (for example, "''nlpl-numpy-1.24.4-foss-2022b-Python-3.10.8''"). |
− | For stack version " | + | For stack version "2021a" it is usually Python 3.9.5, for stack version "2022b" it is usually Python 3.10.8. |
Check the output of the '''module avail''' command for the exact module names. | Check the output of the '''module avail''' command for the exact module names. | ||
Line 38: | Line 38: | ||
These are the modules with the most cryptic names. Each of them contains a bunch of software pieces (Python packages, as a rule). | These are the modules with the most cryptic names. Each of them contains a bunch of software pieces (Python packages, as a rule). | ||
Many of these modules will be loaded automatically as dependencies of the regular modules, but sometimes they can be useful themselves. | Many of these modules will be loaded automatically as dependencies of the regular modules, but sometimes they can be useful themselves. | ||
− | They have their own bundle versions: " | + | They have their own bundle versions: "2022.01" or simply "01", etc (further specified as '''VERS'''). |
Here are the details: | Here are the details: | ||
Line 47: | Line 47: | ||
** cached-property | ** cached-property | ||
** filelock | ** filelock | ||
+ | ** termcolor | ||
** regex | ** regex | ||
** sacremoses | ** sacremoses | ||
** mpi4py | ** mpi4py | ||
** jsonlines | ** jsonlines | ||
+ | ** jsonschema | ||
** typing_extensions | ** typing_extensions | ||
** packaging | ** packaging | ||
+ | ** termcolor | ||
+ | ** pyhocon | ||
+ | ** blis | ||
+ | ** pathspec | ||
+ | ** hatchling | ||
+ | ** multidict | ||
+ | ** yarl | ||
+ | ** black | ||
+ | ** click | ||
+ | ** plotly | ||
+ | ** toolz | ||
+ | ** ... | ||
* '''nlpl-nlptools/VERS-ARCH''': various utility packages related to NLP | * '''nlpl-nlptools/VERS-ARCH''': various utility packages related to NLP | ||
** evaluate | ** evaluate | ||
Line 59: | Line 73: | ||
** langdetect | ** langdetect | ||
** leven | ** leven | ||
+ | ** lxml | ||
+ | ** portalocker | ||
** rouge_score | ** rouge_score | ||
+ | ** sacrebleu | ||
+ | ** udapi | ||
+ | ** word2number | ||
* '''nlpl-scipy-ecosystem/VERS-ARCH''': everything that constitutes the [https://scipy.org/ SciPy ecosystem]. Too many packages to enumerate them all, but the most important are: | * '''nlpl-scipy-ecosystem/VERS-ARCH''': everything that constitutes the [https://scipy.org/ SciPy ecosystem]. Too many packages to enumerate them all, but the most important are: | ||
** scipy | ** scipy | ||
Line 69: | Line 88: | ||
** networkx | ** networkx | ||
** sympy | ** sympy | ||
+ | ** beautifulsoup4 | ||
+ | ** numexpr | ||
+ | * '''nlpl-llmtools/VERS-ARCH''': various utility packages for working with large language models (LLMs) | ||
+ | ** peft ([https://pypi.org/project/peft/ HuggingFace PEFT]: State-of-the-art Parameter-Efficient Fine-Tuning) | ||
+ | ** promptsource ([https://pypi.org/project/promptsource/ Toolkit for creating, sharing and using natural language prompts]) | ||
+ | ** lm-evaluation-harness ([https://github.com/EleutherAI/lm-evaluation-harness EleutherAI Language Model Evaluation Harness]) | ||
+ | ** bert_score: [https://pypi.org/project/bert-score/ BERTScore] to evaluate NLG tasks | ||
+ | ** ... | ||
=== "Regular" modules === | === "Regular" modules === | ||
Line 78: | Line 105: | ||
* '''nlpl-pytorch/1.7.1-ARCH-cuda-11.1.1-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 11) | * '''nlpl-pytorch/1.7.1-ARCH-cuda-11.1.1-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 11) | ||
* '''nlpl-pytorch/1.11.0-ARCH-cuda-11.3.1-Python-3.9.5''': [https://pytorch.org/ PyTorch] 1.11.0 (for CUDA 11) | * '''nlpl-pytorch/1.11.0-ARCH-cuda-11.3.1-Python-3.9.5''': [https://pytorch.org/ PyTorch] 1.11.0 (for CUDA 11) | ||
+ | * '''nlpl-pytorch/2.1.2-ARCH-cuda-12.0.0-Python-3.10.8''': [https://pytorch.org/ PyTorch] 2.1.2 (for CUDA 12) | ||
* '''nlpl-tensorflow/1.15.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 1.15.2 (for CUDA 10) | * '''nlpl-tensorflow/1.15.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 1.15.2 (for CUDA 10) | ||
Line 83: | Line 111: | ||
* '''nlpl-tensorflow/2.6.2-ARCH-cuda-11.1.1-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.6.2 (for CUDA 11) | * '''nlpl-tensorflow/2.6.2-ARCH-cuda-11.1.1-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.6.2 (for CUDA 11) | ||
* '''nlpl-tensorflow/2.6.5-ARCH-cuda-11.3.1-Python-3.9.5''': [https://www.tensorflow.org/ TensorFlow] 2.6.5 (for CUDA 11) | * '''nlpl-tensorflow/2.6.5-ARCH-cuda-11.3.1-Python-3.9.5''': [https://www.tensorflow.org/ TensorFlow] 2.6.5 (for CUDA 11) | ||
+ | * '''nlpl-tensorflow/2.15.0-ARCH-cuda-12.0.0-Python-3.10.8''': [https://www.tensorflow.org/ TensorFlow] 2.15.0 (for CUDA 12) | ||
* '''nlpl-accelerate/0.13.2-ARCH-Python-3.9.5''': [https://pypi.org/project/accelerate/ Accelerate] 0.13.2 | * '''nlpl-accelerate/0.13.2-ARCH-Python-3.9.5''': [https://pypi.org/project/accelerate/ Accelerate] 0.13.2 | ||
+ | * '''nlpl-accelerate/0.27.2-ARCH-Python-3.10.8''': [https://pypi.org/project/accelerate/ Accelerate] 0.27.2 | ||
==== Others ==== | ==== Others ==== | ||
+ | * '''nlpl-bitsandbytes/VERS-ARCH''': [https://pypi.org/project/bitsandbytes/ BitsAndBytes] | ||
* '''nlpl-cython/VERS-ARCH''': [http://cython.org/ Cython] | * '''nlpl-cython/VERS-ARCH''': [http://cython.org/ Cython] | ||
* '''nlpl-datasets/VERS-ARCH''': [https://github.com/huggingface/datasets HuggingFace Datasets] | * '''nlpl-datasets/VERS-ARCH''': [https://github.com/huggingface/datasets HuggingFace Datasets] | ||
Line 97: | Line 128: | ||
* '''nlpl-scikit-bundle/VERS-ARCH''': [https://scikit-learn.org/ Scikit-Learn] | * '''nlpl-scikit-bundle/VERS-ARCH''': [https://scikit-learn.org/ Scikit-Learn] | ||
* '''nlpl-sentencepiece/VERS-ARCH''': [https://github.com/google/sentencepiece SentencePiece] | * '''nlpl-sentencepiece/VERS-ARCH''': [https://github.com/google/sentencepiece SentencePiece] | ||
+ | * '''nlpl-sentence-transformers/VERS-ARCH''': [https://sbert.net SentenceTransformers] | ||
* '''nlpl-simple_elmo/VERS-ARCH''': [https://pypi.org/project/simple-elmo/ Simple_elmo] | * '''nlpl-simple_elmo/VERS-ARCH''': [https://pypi.org/project/simple-elmo/ Simple_elmo] | ||
* '''nlpl-stanza/VERS-ARCH''': [https://stanfordnlp.github.io/stanza/ Stanza] | * '''nlpl-stanza/VERS-ARCH''': [https://stanfordnlp.github.io/stanza/ Stanza] | ||
* '''nlpl-tensorboard/VERS-ARCH''': [https://github.com/tensorflow/tensorboard TensorBoard] | * '''nlpl-tensorboard/VERS-ARCH''': [https://github.com/tensorflow/tensorboard TensorBoard] | ||
* '''nlpl-tokenizers/VERS-ARCH''': [https://github.com/huggingface/tokenizers HuggingFace Tokenizers] | * '''nlpl-tokenizers/VERS-ARCH''': [https://github.com/huggingface/tokenizers HuggingFace Tokenizers] | ||
+ | * '''nlpl-torch-geometric/VERS-ARCH''': [https://pyg.org/ PyTorch Geometric] | ||
* '''nlpl-torchmetrics/VERS-ARCH''': [https://pypi.org/project/torchmetrics/ TorchMetrics] | * '''nlpl-torchmetrics/VERS-ARCH''': [https://pypi.org/project/torchmetrics/ TorchMetrics] | ||
* '''nlpl-torchtext/VERS-ARCH''': [https://pypi.org/project/torchtext/ TorchText] | * '''nlpl-torchtext/VERS-ARCH''': [https://pypi.org/project/torchtext/ TorchText] |
Latest revision as of 12:43, 30 September 2024
Contents
NLPL virtual laboratory
The laboratory is a reproducible custom-built set of NLP software. It is currently installed on Saga, Fox, and Puhti HPC clusters.
- To use on Saga: run the following command (can be put in the ~/.bashrc file to be run automatically at login):
module use -a /cluster/shared/nlpl/software/eb/etc/all/
- To use on Fox: run the following command (can be put in the ~/.bashrc file to be run automatically at login):
module use -a /fp/projects01/ec30/software/easybuild/modules/all/
After that, the "nlpl"-branded modules will be available via module avail, module load, etc.
It is highly recommended to use them, instead of installing a copy in one's own home directory.
List of modules
From time to time, updated modules with newer software versions will be added, but the older modules will never be removed (for reproducibility).
Note that the modules which have "gomkl" or "intel" in their names are built using Intel Math Kernel Library, making them (somewhat) faster in CPU tasks with Intel processors (for example, on Saga, except a100 partition which uses AMD CPUs).
Those with "foss" in their names are CPU-agnostic and will run on machines with all kinds of CPUs (for example, on Fox, but also on all Saga partitions).
The next element in the module name after "foss", "gomkl" or "intel" is the virtual laboratory (stack) version: for example, "2019b", "2021a" or "2022b". Modules from different stack versions are incompatible with each other: you cannot load a module from "foss-2019b" and a module from "foss-2022b" simultaneously. Currently, "2022b" version is the recommended one, with the most up-to-date software built with the most up-to-date compilers.
Further on, we just use the placeholder ARCH, replace it with "gomkl", "intel" or "foss" and "2021a" or "2022b", depending on which machine you are working on and what stack version you want to use. Some modules also have the Python version specified in their names (for example, "nlpl-numpy-1.24.4-foss-2022b-Python-3.10.8"). For stack version "2021a" it is usually Python 3.9.5, for stack version "2022b" it is usually Python 3.10.8. Check the output of the module avail command for the exact module names.
"Bundle" modules
These are the modules with the most cryptic names. Each of them contains a bunch of software pieces (Python packages, as a rule). Many of these modules will be loaded automatically as dependencies of the regular modules, but sometimes they can be useful themselves. They have their own bundle versions: "2022.01" or simply "01", etc (further specified as VERS). Here are the details:
- nlpl-python-candy/VERS-ARCH: various utility packages not directly related to NLP
- tqdm
- pydot
- smart_open
- cached-property
- filelock
- termcolor
- regex
- sacremoses
- mpi4py
- jsonlines
- jsonschema
- typing_extensions
- packaging
- termcolor
- pyhocon
- blis
- pathspec
- hatchling
- multidict
- yarl
- black
- click
- plotly
- toolz
- ...
- nlpl-nlptools/VERS-ARCH: various utility packages related to NLP
- evaluate
- conllu
- seqeval
- langdetect
- leven
- lxml
- portalocker
- rouge_score
- sacrebleu
- udapi
- word2number
- nlpl-scipy-ecosystem/VERS-ARCH: everything that constitutes the SciPy ecosystem. Too many packages to enumerate them all, but the most important are:
- scipy
- pandas
- matplotlib
- ipython
- jupyter_core
- jupyter_client
- networkx
- sympy
- beautifulsoup4
- numexpr
- nlpl-llmtools/VERS-ARCH: various utility packages for working with large language models (LLMs)
- peft (HuggingFace PEFT: State-of-the-art Parameter-Efficient Fine-Tuning)
- promptsource (Toolkit for creating, sharing and using natural language prompts)
- lm-evaluation-harness (EleutherAI Language Model Evaluation Harness)
- bert_score: BERTScore to evaluate NLG tasks
- ...
"Regular" modules
These are more obvious modules, each one gives you one software piece:
Most important
- nlpl-pytorch/1.6.0-ARCH-cuda-10.1.243-Python-3.7.4: PyTorch 1.6.0 (for CUDA 10)
- nlpl-pytorch/1.7.1-ARCH-cuda-10.1.243-Python-3.7.4: PyTorch 1.7.1 (for CUDA 10)
- nlpl-pytorch/1.7.1-ARCH-cuda-11.1.1-Python-3.7.4: PyTorch 1.7.1 (for CUDA 11)
- nlpl-pytorch/1.11.0-ARCH-cuda-11.3.1-Python-3.9.5: PyTorch 1.11.0 (for CUDA 11)
- nlpl-pytorch/2.1.2-ARCH-cuda-12.0.0-Python-3.10.8: PyTorch 2.1.2 (for CUDA 12)
- nlpl-tensorflow/1.15.2-ARCH-cuda-10.1.243-Python-3.7.4: TensorFlow 1.15.2 (for CUDA 10)
- nlpl-tensorflow/2.3.2-ARCH-cuda-10.1.243-Python-3.7.4: TensorFlow 2.3.2 (for CUDA 10)
- nlpl-tensorflow/2.6.2-ARCH-cuda-11.1.1-Python-3.7.4: TensorFlow 2.6.2 (for CUDA 11)
- nlpl-tensorflow/2.6.5-ARCH-cuda-11.3.1-Python-3.9.5: TensorFlow 2.6.5 (for CUDA 11)
- nlpl-tensorflow/2.15.0-ARCH-cuda-12.0.0-Python-3.10.8: TensorFlow 2.15.0 (for CUDA 12)
- nlpl-accelerate/0.13.2-ARCH-Python-3.9.5: Accelerate 0.13.2
- nlpl-accelerate/0.27.2-ARCH-Python-3.10.8: Accelerate 0.27.2
Others
- nlpl-bitsandbytes/VERS-ARCH: BitsAndBytes
- nlpl-cython/VERS-ARCH: Cython
- nlpl-datasets/VERS-ARCH: HuggingFace Datasets
- nlpl-gensim/VERS-ARCH: Gensim
- nlpl-horovod/VERS-ARCH: Horovod
- nlpl-huggingface-hub/VERS-ARCH-2019b-Python-3.7.4: HuggingFace Hub
- nlpl-nltk/VERS-ARCH: NLTK, together with all the corpora and datasets (no need to download them separately!)
- nlpl-numpy/VERS-ARCH: NumPy
- nlpl-pytorch-lightning/VERS-ARCH-cuda-11.3.1: PyTorch Lightning
- nlpl-scikit-bundle/VERS-ARCH: Scikit-Learn
- nlpl-sentencepiece/VERS-ARCH: SentencePiece
- nlpl-sentence-transformers/VERS-ARCH: SentenceTransformers
- nlpl-simple_elmo/VERS-ARCH: Simple_elmo
- nlpl-stanza/VERS-ARCH: Stanza
- nlpl-tensorboard/VERS-ARCH: TensorBoard
- nlpl-tokenizers/VERS-ARCH: HuggingFace Tokenizers
- nlpl-torch-geometric/VERS-ARCH: PyTorch Geometric
- nlpl-torchmetrics/VERS-ARCH: TorchMetrics
- nlpl-torchtext/VERS-ARCH: TorchText
- nlpl-transformers/VERS-ARCH: HuggingFace Transformers
- nlpl-wandb/VERS-ARCH: Weights and Biases (wandb)
- nlpl-warc2text/VERS-ARCH: warc2text
- nlpl-dllogger/0.1.0-ARCH-2019b-Python-3.7.4: DLLogger 0.1.0 (status unclear)
- nlpl-nvidia-bert/20.06.8-ARCH-2019b-tensorflow-1.15.2-Python-3.7.4: NVIDIA's BERT implementation for TensorFlow 1 (status unclear)
Source
Currently, the virtual laboratory is generated using EasyBuild, all the code and easyconfigs available in this repository.