Latest revision as of 23:24, 21 October 2025

NLPL virtual laboratory

The laboratory is a reproducible custom-built set of NLP software. It is currently installed on Saga and Fox HPC clusters.

- To use on Saga: run the following command (can be put in the ~/.bashrc file to be run automatically at login):

module use -a /cluster/shared/nlpl/software/eb/etc/all/

- To use on Fox: run the following command (can be put in the ~/.bashrc file to be run automatically at login):

module use -a /fp/projects01/ec30/software/easybuild/modules/all/

After that, the "nlpl"-branded modules will be available via module avail, module load, etc. See all the NLPL modules with the module avail nlpl command.

It is highly recommended to use them, instead of installing a copy in one's own home directory.

List of modules

From time to time, updated modules with newer software versions will be added, but the older modules will never be removed (for reproducibility).

Note that the modules which have "gomkl" or "intel" in their names are built using Intel Math Kernel Library, making them (somewhat) faster in CPU tasks with Intel processors (for example, on Saga, except a100 partition which uses AMD CPUs).

Those with "foss" in their names are CPU-agnostic and will run on machines with all kinds of CPUs (for example, on Fox, but also on all Saga partitions). These are the default ones.

The next element in the module name after "foss", "gomkl" or "intel" is the virtual laboratory (stack) version: for example, "2021a", "2022b" or "2024a". Modules from different stack versions are incompatible with each other: you cannot load a module from "foss-2021a" and a module from "foss-2024a" simultaneously. Currently, "2024a" version is the recommended one, with the most up-to-date software built with the most up-to-date compilers.

Further on, we just use the placeholder ARCH, replace it with "gomkl", "intel" or "foss" and "2022b" or "2024a", depending on which machine you are working on and what stack version you want to use. Some modules also have the Python version specified in their names (for example, "nlpl-transformers/4.55.4-foss-2024a-Python-3.12.3"). For stack version "2022b" it is usually Python 3.10.8, for stack version "2024a" it is usually Python 3.12.3. Check the output of the module avail nlpl command for the exact module names.

"Bundle" modules

These are the modules with the most cryptic names. Each of them contains a bunch of software pieces (Python packages, as a rule). Many of these modules will be loaded automatically as dependencies of the regular modules, but sometimes they can be useful themselves. They have their own bundle versions: "2022.01" or simply "01", etc (further specified as VERS). Here are the details:

nlpl-python-candy/VERS-ARCH: various utility packages not directly related to NLP
- tqdm
- pydot
- smart_open
- cached-property
- filelock
- termcolor
- regex
- sacremoses
- mpi4py
- jsonlines
- jsonschema
- typing_extensions
- packaging
- termcolor
- pyhocon
- blis
- pathspec
- hatchling
- multidict
- yarl
- black
- click
- plotly
- toolz
- msgspec
- ...
nlpl-nlptools/VERS-ARCH: various utility packages related to NLP
- evaluate
- conllu
- seqeval
- langdetect
- levenshtein
- rouge_score
- sacrebleu
- udapi
- word2number
- ufal.chu-liu-edmonds
- gensim
nlpl-scipy-ecosystem/VERS-ARCH: everything that constitutes the SciPy ecosystem. Too many packages to enumerate them all, but the most important are:
- scipy
- pandas
- matplotlib
- ipython
- jupyter_core
- jupyter_client
- networkx
- sympy
- beautifulsoup4
- numexpr
- einops
- ...
nlpl-llmtools/VERS-ARCH: various utility packages for working with large language models (LLMs)
- peft (HuggingFace PEFT: State-of-the-art Parameter-Efficient Fine-Tuning)
- promptsource (Toolkit for creating, sharing and using natural language prompts)
- lm-evaluation-harness (EleutherAI Language Model Evaluation Harness)
- bert_score: BERTScore to evaluate NLG tasks
- llguidance (Low-level Guidance: constrained decoding for LLMs)
- mistral_common (Mistral-common: common utilities for Mistral AI)
- ...
nlpl-torch-audio-vision/VERS-ARCH: multimodal extensions for PyTorch
- torch-vision (torchvision: image and video datasets and models for PyTorch deep learning)
- torch-audio (torchaudio: an audio library for PyTorch)

"Regular" modules

These are more obvious modules, each one gives you one software piece:

Most important

nlpl-pytorch/1.6.0-ARCH-cuda-10.1.243-Python-3.7.4: PyTorch 1.6.0 (for CUDA 10)
nlpl-pytorch/1.7.1-ARCH-cuda-10.1.243-Python-3.7.4: PyTorch 1.7.1 (for CUDA 10)
nlpl-pytorch/1.7.1-ARCH-cuda-11.1.1-Python-3.7.4: PyTorch 1.7.1 (for CUDA 11)
nlpl-pytorch/1.11.0-ARCH-cuda-11.3.1-Python-3.9.5: PyTorch 1.11.0 (for CUDA 11)
nlpl-pytorch/2.1.2-ARCH-cuda-12.0.0-Python-3.10.8: PyTorch 2.1.2 (for CUDA 12)
nlpl-pytorch/2.6.0-ARCH-cuda-12.6.0-Python-3.12.3: PyTorch 2.6.0 (for CUDA 12.6)

nlpl-tensorflow/1.15.2-ARCH-cuda-10.1.243-Python-3.7.4: TensorFlow 1.15.2 (for CUDA 10)
nlpl-tensorflow/2.3.2-ARCH-cuda-10.1.243-Python-3.7.4: TensorFlow 2.3.2 (for CUDA 10)
nlpl-tensorflow/2.6.2-ARCH-cuda-11.1.1-Python-3.7.4: TensorFlow 2.6.2 (for CUDA 11)
nlpl-tensorflow/2.6.5-ARCH-cuda-11.3.1-Python-3.9.5: TensorFlow 2.6.5 (for CUDA 11)
nlpl-tensorflow/2.15.0-ARCH-cuda-12.0.0-Python-3.10.8: TensorFlow 2.15.0 (for CUDA 12)
nlpl-tensorflow/2.18.1-ARCH-cuda-12.6.0-Python-3.12.3: TensorFlow 2.18.1 (for CUDA 12.6)

nlpl-accelerate/0.13.2-ARCH-Python-3.9.5: Accelerate 0.13.2
nlpl-accelerate/0.27.2-ARCH-Python-3.10.8: Accelerate 0.27.2
nlpl-accelerate/1.9.0-ARCH-Python-3.12.3: Accelerate 1.9.0

nlpl-transformers/VERS-ARCH: HuggingFace Transformers
nlpl-vllm/VERS-ARCH: vLLM (also includes flash-attention, xformers, openai, Ray)

Others

nlpl-bitsandbytes/VERS-ARCH: BitsAndBytes
nlpl-bm25s/VERS-ARCH: BM25S
nlpl-cython/VERS-ARCH: Cython
nlpl-datasets/VERS-ARCH: HuggingFace Datasets
nlpl-gensim/VERS-ARCH: Gensim
nlpl-horovod/VERS-ARCH: Horovod
nlpl-huggingface-hub/VERS-ARCH: HuggingFace Hub
nlpl-nltk/VERS-ARCH: NLTK, together with all the corpora and datasets (no need to download them separately!)
nlpl-numpy/VERS-ARCH: NumPy
nlpl-pytorch-lightning/VERS-ARCH: PyTorch Lightning
nlpl-scikit-bundle/VERS-ARCH: Scikit-Learn
nlpl-sentencepiece/VERS-ARCH: SentencePiece
nlpl-sentence-transformers/VERS-ARCH: SentenceTransformers
nlpl-simple_elmo/VERS-ARCH: Simple_elmo
nlpl-stanza/VERS-ARCH: Stanza
nlpl-tensorboard/VERS-ARCH: TensorBoard
nlpl-tokenizers/VERS-ARCH: HuggingFace Tokenizers
nlpl-torch-geometric/VERS-ARCH: PyTorch Geometric
nlpl-torchmetrics/VERS-ARCH: TorchMetrics
nlpl-torchtext/VERS-ARCH: TorchText
nlpl-trl/VERS-ARCH: HuggingFace Transformer Reinforcement Learning
nlpl-wandb/VERS-ARCH: Weights and Biases (wandb)
nlpl-warc2text/VERS-ARCH: warc2text

nlpl-dllogger/0.1.0-ARCH-2019b-Python-3.7.4: DLLogger 0.1.0 (status unclear)
nlpl-nvidia-bert/20.06.8-ARCH-2019b-tensorflow-1.15.2-Python-3.7.4: NVIDIA's BERT implementation for TensorFlow 1 (status unclear)

Source

Currently, the virtual laboratory is generated using EasyBuild, all the code and easyconfigs available in this repository.

@@ Line 2: / Line 2: @@
 The laboratory is a reproducible custom-built set of NLP software.
-It is currently installed on ''Saga'', ''Fox'', and ''Puhti'' HPC clusters.
+It is currently installed on ''Saga'' and ''Fox'' HPC clusters.
 - To use on ''Saga'': run the following command (can be put in the ''~/.bashrc'' file to be run automatically at login):
@@ Line 12: / Line 12: @@
 ''module use -a /fp/projects01/ec30/software/easybuild/modules/all/''
-After that, the "nlpl"-branded modules will be available via ''module avail'', ''module load'', etc.
+After that, the "nlpl"-branded modules will be available via ''module avail'', ''module load'', etc. See all the NLPL modules with the ''module avail nlpl'' command.
 It is highly recommended to use them, instead of installing a copy in one's own home directory.
@@ Line 24: / Line 24: @@
 with Intel processors (for example, on ''Saga'', '''except ''a100'' partition''' which uses AMD CPUs).
-Those with "foss" in their names are CPU-agnostic and will run on machines with all kinds of CPUs (for example, on ''Fox'', but also on all ''Saga'' partitions).
+Those with "foss" in their names are CPU-agnostic and will run on machines with all kinds of CPUs (for example, on ''Fox'', but also on all ''Saga'' partitions). These are the default ones.
-The next element in the module name after "foss", "gomkl" or "intel" is the virtual laboratory (stack) version: for example, "2019b", "2021a" or "2022b".
+The next element in the module name after "foss", "gomkl" or "intel" is the virtual laboratory (stack) version: for example, "2021a", "2022b" or "2024a".
-Modules from different stack versions are incompatible  with each other: you cannot load a module from "foss-2019b" and a module from "foss-2022b" simultaneously.
+Modules from different stack versions are incompatible  with each other: you cannot load a module from "foss-2021a" and a module from "foss-2024a" simultaneously.
-'''Currently, "2022b" version is the recommended one, with the most up-to-date software built with the most up-to-date compilers.'''
+'''Currently, "2024a" version is the recommended one, with the most up-to-date software built with the most up-to-date compilers.'''
-Further on, we just use the placeholder '''ARCH''', replace it with "gomkl", "intel" or "foss" and "2021a" or "2022b", depending on which machine you are working on and what stack version you want to use.
+Further on, we just use the placeholder '''ARCH''', replace it with "gomkl", "intel" or "foss" and "2022b" or "2024a", depending on which machine you are working on and what stack version you want to use.
-Some modules also have the Python version specified in their names (for example, "''nlpl-numpy-1.24.4-foss-2022b-Python-3.10.8''").
+Some modules also have the Python version specified in their names (for example, "''nlpl-transformers/4.55.4-foss-2024a-Python-3.12.3''").
-For stack version "2021a" it is usually Python 3.9.5, for stack version "2022b" it is usually Python 3.10.8.
+For stack version "2022b" it is usually Python 3.10.8, for stack version "2024a" it is usually Python 3.12.3.
-Check the output of the '''module avail''' command for the exact module names.
+Check the output of the '''module avail nlpl''' command for the exact module names.
 === "Bundle" modules ===
@@ Line 66: / Line 66: @@
 ** plotly
 ** toolz
+** msgspec
 ** ...
 * '''nlpl-nlptools/VERS-ARCH''': various utility packages related to NLP
@@ Line 72: / Line 73: @@
 ** seqeval
 ** langdetect
-** leven
+** levenshtein
-** lxml
-** portalocker
 ** rouge_score
 ** sacrebleu
 ** udapi
 ** word2number
+** ufal.chu-liu-edmonds
+** gensim
 * '''nlpl-scipy-ecosystem/VERS-ARCH''': everything that constitutes the [https://scipy.org/ SciPy ecosystem]. Too many packages to enumerate them all, but the most important are:
 ** scipy
@@ Line 90: / Line 91: @@
 ** beautifulsoup4
 ** numexpr
+** einops
+** ...
 * '''nlpl-llmtools/VERS-ARCH''': various utility packages for working with large language models (LLMs)
 ** peft ([https://pypi.org/project/peft/ HuggingFace PEFT]: State-of-the-art Parameter-Efficient Fine-Tuning)
@@ Line 95: / Line 98: @@
 ** lm-evaluation-harness ([https://github.com/EleutherAI/lm-evaluation-harness EleutherAI Language Model Evaluation Harness])
 ** bert_score: [https://pypi.org/project/bert-score/ BERTScore] to evaluate NLG tasks
+** llguidance ([https://github.com/microsoft/llguidance Low-level Guidance]: constrained decoding for LLMs)
+** mistral_common ([https://pypi.org/project/mistral_common/ Mistral-common]: common utilities for Mistral AI)
 ** ...
+* '''nlpl-torch-audio-vision/VERS-ARCH''': multimodal extensions for PyTorch
+** torch-vision ([https://github.com/pytorch/vision torchvision]: image and video datasets and models for PyTorch deep learning)
+** torch-audio ([https://github.com/pytorch/audio torchaudio]: an audio library for PyTorch)
 === "Regular" modules ===
@@ Line 106: / Line 114: @@
 * '''nlpl-pytorch/1.11.0-ARCH-cuda-11.3.1-Python-3.9.5''': [https://pytorch.org/ PyTorch] 1.11.0 (for CUDA 11)
 * '''nlpl-pytorch/2.1.2-ARCH-cuda-12.0.0-Python-3.10.8''': [https://pytorch.org/ PyTorch] 2.1.2 (for CUDA 12)
+* '''nlpl-pytorch/2.6.0-ARCH-cuda-12.6.0-Python-3.12.3''': [https://pytorch.org/ PyTorch] 2.6.0 (for CUDA 12.6)
 * '''nlpl-tensorflow/1.15.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 1.15.2 (for CUDA 10)
@@ Line 112: / Line 121: @@
 * '''nlpl-tensorflow/2.6.5-ARCH-cuda-11.3.1-Python-3.9.5''': [https://www.tensorflow.org/ TensorFlow] 2.6.5 (for CUDA 11)
 * '''nlpl-tensorflow/2.15.0-ARCH-cuda-12.0.0-Python-3.10.8''': [https://www.tensorflow.org/ TensorFlow] 2.15.0 (for CUDA 12)
+* '''nlpl-tensorflow/2.18.1-ARCH-cuda-12.6.0-Python-3.12.3''': [https://www.tensorflow.org/ TensorFlow] 2.18.1 (for CUDA 12.6)
 * '''nlpl-accelerate/0.13.2-ARCH-Python-3.9.5''': [https://pypi.org/project/accelerate/ Accelerate] 0.13.2
 * '''nlpl-accelerate/0.27.2-ARCH-Python-3.10.8''': [https://pypi.org/project/accelerate/ Accelerate] 0.27.2
+* '''nlpl-accelerate/1.9.0-ARCH-Python-3.12.3''': [https://pypi.org/project/accelerate/ Accelerate] 1.9.0
+* '''nlpl-transformers/VERS-ARCH''': [https://huggingface.co/transformers/ HuggingFace Transformers]
+* '''nlpl-vllm/VERS-ARCH''': [https://github.com/vllm-project/vllm vLLM] (also includes ''flash-attention'', ''xformers'', ''openai'', ''Ray'')
 ==== Others ====
 * '''nlpl-bitsandbytes/VERS-ARCH''': [https://pypi.org/project/bitsandbytes/ BitsAndBytes]
+* '''nlpl-bm25s/VERS-ARCH''': [https://github.com/xhluca/bm25s BM25S]
 * '''nlpl-cython/VERS-ARCH''': [http://cython.org/ Cython]
 * '''nlpl-datasets/VERS-ARCH''': [https://github.com/huggingface/datasets HuggingFace Datasets]
 * '''nlpl-gensim/VERS-ARCH''': [https://github.com/RaRe-Technologies/gensim Gensim]
 * '''nlpl-horovod/VERS-ARCH''': [https://github.com/horovod/horovod Horovod]
-* '''nlpl-huggingface-hub/VERS-ARCH-2019b-Python-3.7.4''': [https://pypi.org/project/huggingface-hub/ HuggingFace Hub]
+* '''nlpl-huggingface-hub/VERS-ARCH''': [https://pypi.org/project/huggingface-hub/ HuggingFace Hub]
 * '''nlpl-nltk/VERS-ARCH''': [https://www.nltk.org/ NLTK], together with '''all''' the corpora and datasets (no need to download them separately!)
 * '''nlpl-numpy/VERS-ARCH''': [https://numpy.org/ NumPy]
-* '''nlpl-pytorch-lightning/VERS-ARCH-cuda-11.3.1''': [https://www.pytorchlightning.ai/ PyTorch Lightning]
+* '''nlpl-pytorch-lightning/VERS-ARCH''': [https://www.pytorchlightning.ai/ PyTorch Lightning]
 * '''nlpl-scikit-bundle/VERS-ARCH''': [https://scikit-learn.org/ Scikit-Learn]
 * '''nlpl-sentencepiece/VERS-ARCH''': [https://github.com/google/sentencepiece SentencePiece]
@@ Line 136: / Line 151: @@
 * '''nlpl-torchmetrics/VERS-ARCH''': [https://pypi.org/project/torchmetrics/ TorchMetrics]
 * '''nlpl-torchtext/VERS-ARCH''': [https://pypi.org/project/torchtext/ TorchText]
-* '''nlpl-transformers/VERS-ARCH''': [https://huggingface.co/transformers/ HuggingFace Transformers]
 * '''nlpl-trl/VERS-ARCH''': [https://huggingface.co/docs/trl/index HuggingFace Transformer Reinforcement Learning]
 * '''nlpl-wandb/VERS-ARCH''': [https://pypi.org/project/wandb/ Weights and Biases (wandb)]

Difference between revisions of "Eosc/easybuild/modules"

Latest revision as of 23:24, 21 October 2025

Contents

NLPL virtual laboratory

List of modules

"Bundle" modules

"Regular" modules

Most important

Others

Source

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools