Infrastructure/software/catalogue

From Nordic Language Processing Laboratory
(Difference between revisions)
Jump to: navigation, search
(Activity A: Basic Infrastructure)
(Activity A: Basic Infrastructure)
Line 36: Line 36:
  
 
= Activity A: Basic Infrastructure =
 
= Activity A: Basic Infrastructure =
 +
 +
Interoperability of NLPL installations with each other, as well as with system-wide
 +
software that is maintained by the core operations teams for Abel and Taito, is no
 +
small challenge; neither is parallelism across the two systems, for example in
 +
available software (and versions) and techniques for ‘mixing and matching’.
 +
These challenges with regard to common Deep Learning frameworks are discussed
 +
on a [http://wiki.nlpl.eu/index.php/Infrastructure/software/frameworks separate page].
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 47: Line 54:
 
| [http://wiki.nlpl.eu/index.php/Infrastructure/software/tensorflow nlpl-tensorflow/1.11] || TensorFlow Deep Learning Framework (CPU and GPU) || Abel, Taito || September 2018 || Stephan Oepen
 
| [http://wiki.nlpl.eu/index.php/Infrastructure/software/tensorflow nlpl-tensorflow/1.11] || TensorFlow Deep Learning Framework (CPU and GPU) || Abel, Taito || September 2018 || Stephan Oepen
 
|}
 
|}
 
Interoperability of NLPL installations with each other, as well as with system-wide
 
software that is maintained by the core operations teams for Abel and Taito, is no
 
small challenge; neither is parallelism across the two systems, for example in
 
available software (and versions) and techniques for ‘mixing and matching’.
 
These challenges with regard to common Deep Learning frameworks are discussed
 
on a [http://wiki.nlpl.eu/index.php/Infrastructure/software/frameworks separate page].
 
  
 
= Activity B: Statistical and Neural Machine Translation =
 
= Activity B: Statistical and Neural Machine Translation =

Revision as of 22:48, 30 September 2018

Contents

Background

This page provides a high-level summary of NLPL-specific software installed on either of our two systems. As a rule of thumb, NLPL aims to build on generic software installations provided by the system maintainers (e.g. development tools and libraries that are not discipline-specific), using the modules infrastructure. For example, an environment like OpenNMT is unlikely to be used by other disciplines, and NLPL stands to gain from in-house, shared expertise that comes with maintaining a project-specific installation. On the other hand, the CUDA libraries are general extensions to the operating system that most users of deep learning frameworks on gpus will want to use; hence, CUDA is most appropriately installed by the core system maintainers. Frameworks like PyTorch and TensorFlow, arguably, present a middle ground to this rule of thumb: In principle, they are not discipline-specific, but in mid-2018 at least the demand for installations of these frameworks is strong within NLPL, and the project will likely benefit from growing its competencies in this area.

Module Catalogue

The discipline-specific modules maintained by NLPL are not activated by default. To make available the NLPL the NLPL directory of module configurations, on top of pre-configured, system-wide modules, one needs to:

  • Abel:
module use -a /projects/nlpl/software/modulefiles/
  • Taito:
module use -a /proj/nlpl/software/modulefiles/

We will add times assume a shell variable $NLPLROOT that points to the top-level project directory, i.e. /projects/nlpl/ (on Abel) or /proj/nlpl/ (on Taito). For NLPL users, we recommend that one adds the above module use command to the shell start-up script, e.g. .bashrc in the user home directory.

Activity A: Basic Infrastructure

Interoperability of NLPL installations with each other, as well as with system-wide software that is maintained by the core operations teams for Abel and Taito, is no small challenge; neither is parallelism across the two systems, for example in available software (and versions) and techniques for ‘mixing and matching’. These challenges with regard to common Deep Learning frameworks are discussed on a separate page.

Module Name/Version Description System Install Date Maintainer
nlpl-nltk/3.3 Natural Language Toolkit (NLTK) Abel, Taito September 2018 Stephan Oepen
nlpl-pytorch/0.4.1 PyTorch Deep Learning Framework (CPU and GPU) Abel, Taito September 2018 Stephan Oepen
nlpl-tensorflow/1.11 TensorFlow Deep Learning Framework (CPU and GPU) Abel, Taito September 2018 Stephan Oepen

Activity B: Statistical and Neural Machine Translation

Module Name/Version Description System Install Date Maintainer
moses/mmt-mvp-v0.12.1-2739-gdc42bcb Moses SMT system, including GIZA++, MGIZA, fast_align Taito November 2017
moses/4.0-65c75ff Moses SMT System Release 4.0, including GIZA++, MGIZA, fast_align, SALM
Some minor fixes added to existing install 2/2018.
Should not break compatibility except when using tokenizer.perl for Finnish or Swedish.
Taito, Abel November 2017
efmaral/0.1_2017_07_20 efmaral and eflomal word alignment tools Taito July 2017
efmaral/0.1_2017_11_24 efmaral and eflomal word alignment tools Taito, Abel November 2017
nlpl-opennmt-py/0.2.1 OpenNMT Python Library Abel, Taito September 2018 Stephan Oepen

Activity C: Data-Driven Parsing

Module Name/Version Description System Install Date Maintainer
nlpl-uuparser/2.1 Uppsala Parser Abel December 2017
nlpl-udpipe/1.2.1-devel UDPipe 1.2 with Pre-Trained Models Taito, Abel November 2017

Activity G: OPUS Parallel Corpus

Module Name/Version Description System Install Date Maintainer
nlpl-cwb/3.4.12 Corpus Work Bench (CWB) Taito, Abel November 2017
nlpl-opus/0.1 Various OPUS Tools Taito, Abel November 2017
nlpl-uplug/0.3.8dev UPlug Parallel Corpus Tools Taito, Abel November 2017
Personal tools
Namespaces

Variants
Actions
Navigation
Tools