Infrastructure/software/catalogue

From Nordic Language Processing Laboratory
(Difference between revisions)
Jump to: navigation, search
(Activity A: Basic Infrastructure)
(Activity C: Data-Driven Parsing)
 
(16 intermediate revisions by 3 users not shown)
Line 20: Line 20:
  
 
The discipline-specific modules maintained by NLPL are not activated by default.
 
The discipline-specific modules maintained by NLPL are not activated by default.
To make available the NLPL directory of module configurations, on top of the
+
To make available the NLPL community directory of software modules, on top of the
pre-configured, system-wide modules, one needs to:
+
pre-configured, system-wide modules, one needs to execute the following
 +
(on Abel, Puhti, or Taito):
  
 
<pre>
 
<pre>
module use -a /proj*/nlpl/software/modulefiles/
+
module use -a /proj*/nlpl/software/modules/etc
 +
</pre>
 +
 
 +
For Saga, the NLPL community directory is in a different location:
 +
 
 +
<pre>
 +
module use -a /cluster/shared/nlpl/software/modules/etc
 
</pre>
 
</pre>
  
 
We will at times assume a shell variable <tt>$NLPLROOT</tt> that points to the
 
We will at times assume a shell variable <tt>$NLPLROOT</tt> that points to the
top-level project directory, i.e. <tt>/projects/nlpl/</tt> (on Abel) or
+
top-level project directory, i.e. <tt>/projects/nlpl/</tt> (on Abel),
<tt>/proj/nlpl/</tt> (on Taito).
+
<tt>/proj/nlpl/</tt> (on Taito),
 +
<tt>/projappl/nlpl/</tt> (on Puhti), and
 +
<tt>/cluster/shared/nlpl/</tt> (on Saga).
 +
 
 
For NLPL users, we recommend that one adds the above <tt>module use</tt> command
 
For NLPL users, we recommend that one adds the above <tt>module use</tt> command
 
to the shell start-up script, e.g. <tt>.bashrc</tt> in the user home directory.
 
to the shell start-up script, e.g. <tt>.bashrc</tt> in the user home directory.
Line 38: Line 48:
 
module avail 2>&1 | grep nlpl
 
module avail 2>&1 | grep nlpl
 
</pre>
 
</pre>
 +
 +
= User-Installed Software =
 +
 +
Even if NLPL strives to make available a comprehensive set of ready-to-run sofware modules,
 +
users will at times want to install their own add-on components.
 +
For Python add-on components, some
 +
[http://wiki.nlpl.eu/index.php/Infrastructure/software/user emerging instructions] are available.
  
 
= Activity A: Basic Infrastructure =
 
= Activity A: Basic Infrastructure =
Line 76: Line 93:
  
 
= Activity B: Statistical and Neural Machine Translation =
 
= Activity B: Statistical and Neural Machine Translation =
 +
 +
=== On Saga and Puhti ===
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 81: Line 100:
 
! Module Name/Version !! Description !! System !! Install Date !! Maintainer
 
! Module Name/Version !! Description !! System !! Install Date !! Maintainer
 
|-
 
|-
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Moses_module nlpl-moses/mmt-mvp-v0.12.1-2739-gdc42bcb]     || Moses SMT system, including GIZA++, MGIZA, fast_align || Taito || July 2017 || Yves Scherrer
+
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Moses_module nlpl-moses/4.0-a89691f] || Moses SMT system, including GIZA++, MGIZA, fast_align || Puhti, Saga || December 2019 || Yves Scherrer
 
|-
 
|-
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Moses_module nlpl-moses/4.0-65c75ff]     || Moses SMT System Release 4.0, including GIZA++, MGIZA, fast_align, SALM<br/>Some minor fixes added to existing install 2/2018.<br/> Should not break compatibility except when using tokenizer.perl for Finnish or Swedish. || Taito, Abel || November 2017 || Yves Scherrer
+
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Efmaral_module nlpl-efmaral/0.1_20191218] || efmaral and eflomal word alignment tools      || Puhti, Saga || December 2019 || Yves Scherrer
 
|-
 
|-
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Efmaral_module nlpl-efmaral/0.1_2017_07_20] || efmaral and eflomal word alignment tools     || Taito || July 2017 || Yves Scherrer
+
| [http://wiki.nlpl.eu/index.php/Translation/mttools nlpl-mttools/20191218] || A collection of preprocessing and evaluation scripts for machine translation     || Puhti, Saga || December 2019 || Yves Scherrer
 
|-
 
|-
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Efmaral_module nlpl-efmaral/0.1_2017_11_24] || efmaral and eflomal word alignment tools      || Taito, Abel || November 2017 || Yves Scherrer
+
| nlpl-opennmt-py/1.0.0rc2/3.7 || OpenNMT Python Library || Saga || October 2019 || Stephan Oepen
 
|-
 
|-
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Efmaral_module nlpl-efmaral/0.1_2018_12_13/17] || efmaral and eflomal word alignment tools      || Taito, Abel || December 2018 || Yves Scherrer
+
| nlpl-opennmt-py/1.0.0 || OpenNMT Python Library || Puhti || December 2019 || Yves Scherrer
 
|-
 
|-
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_HNMT_module nlpl-hnmt/1.0.1] || HNMT neural machine translation system      || Taito || March 2018 || Yves Scherrer
+
| nlpl-marian-nmt/1.8.0-eba7aed || Marian neural machine translation system      || Puhti, Saga || December 2019 || Jörg Tiedemann
 +
|-
 +
|}
 +
 
 +
=== On Abel and Taito ===
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Module Name/Version !! Description !! System !! Install Date !! Maintainer
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Translation/taito_abel#Using_the_Moses_module nlpl-moses/mmt-mvp-v0.12.1-2739-gdc42bcb]      || Moses SMT system, including GIZA++, MGIZA, fast_align || Taito || July 2017 || Yves Scherrer
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Translation/taito_abel#Using_the_Moses_module nlpl-moses/4.0-65c75ff]      || Moses SMT System Release 4.0, including GIZA++, MGIZA, fast_align, SALM<br/>Some minor fixes added to existing install 2/2018.<br/> Should not break compatibility except when using tokenizer.perl for Finnish or Swedish. || Taito, Abel || November 2017 || Yves Scherrer
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Translation/taito_abel#Using_the_Efmaral_module nlpl-efmaral/0.1_2017_07_20] || efmaral and eflomal word alignment tools      || Taito || July 2017 || Yves Scherrer
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Translation/taito_abel#Using_the_Efmaral_module nlpl-efmaral/0.1_2017_11_24] || efmaral and eflomal word alignment tools      || Taito, Abel || November 2017 || Yves Scherrer
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Translation/taito_abel#Using_the_Efmaral_module nlpl-efmaral/0.1_2018_12_13/17] || efmaral and eflomal word alignment tools      || Taito, Abel || December 2018 || Yves Scherrer
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Translation/taito_abel#Using_the_HNMT_module nlpl-hnmt/1.0.1] || HNMT neural machine translation system      || Taito || March 2018 || Yves Scherrer
 
|-
 
|-
 
| [http://wiki.nlpl.eu/index.php/Translation/opennmt-py nlpl-opennmt-py/0.2.1] || OpenNMT Python Library || Abel, Taito || September 2018 || Stephan Oepen
 
| [http://wiki.nlpl.eu/index.php/Translation/opennmt-py nlpl-opennmt-py/0.2.1] || OpenNMT Python Library || Abel, Taito || September 2018 || Stephan Oepen
 
|-
 
|-
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Marian_module nlpl-marian/1.2.0] || Marian neural machine translation system      || Taito || March 2018 || Yves Scherrer
+
| [http://wiki.nlpl.eu/index.php/Translation/taito_abel#Using_the_Marian_module nlpl-marian/1.2.0] || Marian neural machine translation system      || Taito || March 2018 || Yves Scherrer
 
|-
 
|-
 
| marian/1.5 || Marian neural machine translation system      || Taito || June 2018 || CSC staff
 
| marian/1.5 || Marian neural machine translation system      || Taito || June 2018 || CSC staff
 
|-
 
|-
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_mttools_module nlpl-mttools/2018_12_23] || A collection of preprocessing and evaluation script for machine translation      || Taito, Abel || December 2018 || Yves Scherrer
+
| [http://wiki.nlpl.eu/index.php/Translation/taito_abel#Using_the_mttools_module nlpl-mttools/2018_12_23] || A collection of preprocessing and evaluation scripts for machine translation      || Taito, Abel || December 2018 || Yves Scherrer
 
|}
 
|}
  
Line 108: Line 147:
 
! Module Name/Version !! Description !! System !! Install Date !! Maintainer
 
! Module Name/Version !! Description !! System !! Install Date !! Maintainer
 
|-
 
|-
| [http://wiki.nlpl.eu/index.php/Parsing/uuparser nlpl-uuparser] || Uppsala Parser || Abel || December 2018 ||
+
| nlpl-corenlp/3.9.2 || Stanford CoreNLP Suite (Including All Models) || Abel || May 2019 || Stephan Oepen
|-
+
| [http://wiki.nlpl.eu/index.php/Parsing/udpipe nlpl-udpipe/1.2.1-devel] || UDPipe 1.2 with Pre-Trained Models     || Taito, Abel || November 2017 || Jörg Tiedemann
+
 
|-
 
|-
 
| [http://wiki.nlpl.eu/index.php/Parsing/dozat nlpl-dozat/201812] || Stanford Graph-Based Parser by Tim Dozat (v3) || Abel || December 2018 || Stephan Oepen
 
| [http://wiki.nlpl.eu/index.php/Parsing/dozat nlpl-dozat/201812] || Stanford Graph-Based Parser by Tim Dozat (v3) || Abel || December 2018 || Stephan Oepen
|-
 
| [http://wiki.nlpl.eu/index.php/Parsing/repp nlpl-repp/201812] || REPP Tokenizer (and Sentence Splitter) || Abel || December 2018 || Stephan Oepen
 
 
|-
 
|-
 
| [http://wiki.nlpl.eu/index.php/Parsing/stanfordnlp nlpl-stanfordnlp/0.1.1] || Stanford NLP Neural Pipeline || Abel || February 2019 || Stephan Oepen
 
| [http://wiki.nlpl.eu/index.php/Parsing/stanfordnlp nlpl-stanfordnlp/0.1.1] || Stanford NLP Neural Pipeline || Abel || February 2019 || Stephan Oepen
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Parsing/stanfordnlp nlpl-stanfordnlp/0.2.0] || Stanford NLP Neural Pipeline || Saga || ? || Stephan Oepen
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Parsing/uuparser nlpl-uuparser/2.3.1] || Uppsala Parser || Saga,Abel || December 2019 || Sara Stymne ||
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Parsing/turboparser nlpl-turboparser/2.3.0] || TurboParser || Saga|| January 2020 || Sara Stymne ||
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Parsing/udpipe nlpl-udpipe/1.2.1-devel] || UDPipe 1.2 with Pre-Trained Models      || Saga, Puhti,Taito, Abel || November 2017 || Jörg Tiedemann
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Parsing/udpipe nlpl-udpipe_future/3.7] || UDPipe Future || Abel || June 2019 || Andrey Kutuzov
 +
|-
 +
| [http://wiki.nlpl.eu/index.php/Parsing/repp nlpl-repp/201812] || REPP Tokenizer (and Sentence Splitter) || Abel || December 2018 || Stephan Oepen
 
|}
 
|}
  
Line 125: Line 172:
 
! Module Name/Version !! Description !! System !! Install Date !! Maintainer
 
! Module Name/Version !! Description !! System !! Install Date !! Maintainer
 
|-
 
|-
| nlpl-gensim/3.6.0 || GenSim: Topic Modeling for Humans || Taito, Abel || October 2018 || Stephan Oepen
+
| nlpl-gensim/3.6.0 || Topic Modeling and Word Vectors Library || Taito, Abel || October 2018 || Stephan Oepen
 +
|-
 +
| nlpl-gensim/3.7.0 || Topic Modeling and Word Vectors Library || Abel (3.5, 3.7) || December 2018 || Stephan Oepen
 +
|-
 +
| nlpl-gensim/3.7.3 || Topic Modeling and Word Vectors Library || Abel (3.5, 3.7) || May 2018 || Stephan Oepen
 
|}
 
|}
  

Latest revision as of 23:34, 14 January 2020

Contents

[edit] Background

This page provides a high-level summary of NLPL-specific software installed on either of our two systems. As a rule of thumb, NLPL aims to build on generic software installations provided by the system maintainers (e.g. development tools and libraries that are not discipline-specific), using the modules infrastructure. For example, an environment like OpenNMT is unlikely to be used by other disciplines, and NLPL stands to gain from in-house, shared expertise that comes with maintaining a project-specific installation. On the other hand, the CUDA libraries are general extensions to the operating system that most users of deep learning frameworks on gpus will want to use; hence, CUDA is most appropriately installed by the core system maintainers. Frameworks like PyTorch and TensorFlow, arguably, present a middle ground to this rule of thumb: In principle, they are not discipline-specific, but in mid-2018 at least the demand for installations of these frameworks is strong within NLPL, and the project will likely benefit from growing its competencies in this area.

[edit] Module Catalogue

The discipline-specific modules maintained by NLPL are not activated by default. To make available the NLPL community directory of software modules, on top of the pre-configured, system-wide modules, one needs to execute the following (on Abel, Puhti, or Taito):

module use -a /proj*/nlpl/software/modules/etc

For Saga, the NLPL community directory is in a different location:

module use -a /cluster/shared/nlpl/software/modules/etc

We will at times assume a shell variable $NLPLROOT that points to the top-level project directory, i.e. /projects/nlpl/ (on Abel), /proj/nlpl/ (on Taito), /projappl/nlpl/ (on Puhti), and /cluster/shared/nlpl/ (on Saga).

For NLPL users, we recommend that one adds the above module use command to the shell start-up script, e.g. .bashrc in the user home directory.

To inspect what is available, one can use the avail sub-command (on Abel), e.g.

module avail 2>&1 | grep nlpl

[edit] User-Installed Software

Even if NLPL strives to make available a comprehensive set of ready-to-run sofware modules, users will at times want to install their own add-on components. For Python add-on components, some emerging instructions are available.

[edit] Activity A: Basic Infrastructure

Interoperability of NLPL installations with each other, as well as with system-wide software that is maintained by the core operations teams for Abel and Taito, is no small challenge; neither is parallelism across the two systems, for example in available software (and versions) and techniques for ‘mixing and matching’. These challenges are discussed in some more detail with regard to the Python programming environment and with regard to common Deep Learning frameworks.

Module Name/Version Description System Install Date Maintainer
nlpl-cupy/5.4.0 Matrix Library Accelerated by CUDA Abel (3.7) May 2018 Stephan Oepen
nlpl-cython/0.29.3 C Extensions for Python Abel (3.5, 3.7) December 2018 Stephan Oepen
nlpl-dynet/2.1 DyNet Dynamic Neural Network Toolkit (CPU) Abel (3.5, 3.7) February 2019 Stephan Oepen
nlpl-nltk/3.3 Natural Language Toolkit (NLTK) Abel, Taito September 2018 Stephan Oepen
nlpl-pytorch/0.4.1 PyTorch Deep Learning Framework (CPU and GPU) Abel, Taito September 2018 Stephan Oepen
nlpl-pytorch/1.0.0 PyTorch Deep Learning Framework (CPU and GPU) Abel (3.5, 3.7) January 2019 Stephan Oepen
nlpl-pytorch/1.1.0 PyTorch Deep Learning Framework (CPU and GPU) Abel (3.5, 3.7) May 2019 Stephan Oepen
nlpl-spacy/2.0.12 spaCy: Natural Language Processing in Python Abel, Taito October 2018 Stephan Oepen
nlpl-scipy/201901 SciPy Ecosystem of Python Add-Ons Abel (3.5, 3.7) January 2019 Stephan Oepen
nlpl-tensorflow/1.11 TensorFlow Deep Learning Framework (CPU and GPU) Abel, Taito September 2018 Stephan Oepen

[edit] Activity B: Statistical and Neural Machine Translation

[edit] On Saga and Puhti

Module Name/Version Description System Install Date Maintainer
nlpl-moses/4.0-a89691f Moses SMT system, including GIZA++, MGIZA, fast_align Puhti, Saga December 2019 Yves Scherrer
nlpl-efmaral/0.1_20191218 efmaral and eflomal word alignment tools Puhti, Saga December 2019 Yves Scherrer
nlpl-mttools/20191218 A collection of preprocessing and evaluation scripts for machine translation Puhti, Saga December 2019 Yves Scherrer
nlpl-opennmt-py/1.0.0rc2/3.7 OpenNMT Python Library Saga October 2019 Stephan Oepen
nlpl-opennmt-py/1.0.0 OpenNMT Python Library Puhti December 2019 Yves Scherrer
nlpl-marian-nmt/1.8.0-eba7aed Marian neural machine translation system Puhti, Saga December 2019 Jörg Tiedemann

[edit] On Abel and Taito

Module Name/Version Description System Install Date Maintainer
nlpl-moses/mmt-mvp-v0.12.1-2739-gdc42bcb Moses SMT system, including GIZA++, MGIZA, fast_align Taito July 2017 Yves Scherrer
nlpl-moses/4.0-65c75ff Moses SMT System Release 4.0, including GIZA++, MGIZA, fast_align, SALM
Some minor fixes added to existing install 2/2018.
Should not break compatibility except when using tokenizer.perl for Finnish or Swedish.
Taito, Abel November 2017 Yves Scherrer
nlpl-efmaral/0.1_2017_07_20 efmaral and eflomal word alignment tools Taito July 2017 Yves Scherrer
nlpl-efmaral/0.1_2017_11_24 efmaral and eflomal word alignment tools Taito, Abel November 2017 Yves Scherrer
nlpl-efmaral/0.1_2018_12_13/17 efmaral and eflomal word alignment tools Taito, Abel December 2018 Yves Scherrer
nlpl-hnmt/1.0.1 HNMT neural machine translation system Taito March 2018 Yves Scherrer
nlpl-opennmt-py/0.2.1 OpenNMT Python Library Abel, Taito September 2018 Stephan Oepen
nlpl-marian/1.2.0 Marian neural machine translation system Taito March 2018 Yves Scherrer
marian/1.5 Marian neural machine translation system Taito June 2018 CSC staff
nlpl-mttools/2018_12_23 A collection of preprocessing and evaluation scripts for machine translation Taito, Abel December 2018 Yves Scherrer

[edit] Activity C: Data-Driven Parsing

Module Name/Version Description System Install Date Maintainer
nlpl-corenlp/3.9.2 Stanford CoreNLP Suite (Including All Models) Abel May 2019 Stephan Oepen
nlpl-dozat/201812 Stanford Graph-Based Parser by Tim Dozat (v3) Abel December 2018 Stephan Oepen
nlpl-stanfordnlp/0.1.1 Stanford NLP Neural Pipeline Abel February 2019 Stephan Oepen
nlpl-stanfordnlp/0.2.0 Stanford NLP Neural Pipeline Saga  ? Stephan Oepen
nlpl-uuparser/2.3.1 Uppsala Parser Saga,Abel December 2019 Sara Stymne
nlpl-turboparser/2.3.0 TurboParser Saga January 2020 Sara Stymne
nlpl-udpipe/1.2.1-devel UDPipe 1.2 with Pre-Trained Models Saga, Puhti,Taito, Abel November 2017 Jörg Tiedemann
nlpl-udpipe_future/3.7 UDPipe Future Abel June 2019 Andrey Kutuzov
nlpl-repp/201812 REPP Tokenizer (and Sentence Splitter) Abel December 2018 Stephan Oepen

[edit] Activity E: Pre-Trained Word Embeddings

Module Name/Version Description System Install Date Maintainer
nlpl-gensim/3.6.0 Topic Modeling and Word Vectors Library Taito, Abel October 2018 Stephan Oepen
nlpl-gensim/3.7.0 Topic Modeling and Word Vectors Library Abel (3.5, 3.7) December 2018 Stephan Oepen
nlpl-gensim/3.7.3 Topic Modeling and Word Vectors Library Abel (3.5, 3.7) May 2018 Stephan Oepen

[edit] Activity G: OPUS Parallel Corpus

Module Name/Version Description System Install Date Maintainer
nlpl-cwb/3.4.12 Corpus Work Bench (CWB) Taito, Abel November 2017 Jörg Tiedemann
nlpl-opus/0.1 Various OPUS Tools Taito, Abel November 2017 Jörg Tiedemann
nlpl-opus/0.2 Various OPUS Tools Taito, Abel 2018 Jörg Tiedemann
nlpl-opus/201901 Various OPUS Tools Taito, Abel January 2019 Jörg Tiedemann
nlpl-uplug/0.3.8dev UPlug Parallel Corpus Tools Taito, Abel November 2017 Jörg Tiedemann
Personal tools
Namespaces

Variants
Actions
Navigation
Tools