Difference between revisions of "Translation/mttools"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Using the mttools module)
Line 2: Line 2:
  
 
<ul>
 
<ul>
<li>Log into Taito or Abel</li>
 
 
<li>Activate the NLPL software repository and load the module:
 
<li>Activate the NLPL software repository and load the module:
<pre>module use -a /proj*/nlpl/software/modulefiles/
+
<pre>module use -a /projappl/nlpl/software/modules/etc        # Puhti
module load nlpl-mttools</pre>
+
module use -a /cluster/shared/nlpl/software/modules/etc  # Saga
 +
module load nlpl-mttools/</pre>
 
</li>
 
</li>
 
<li>Module-specific help is available by typing:
 
<li>Module-specific help is available by typing:
<pre>module help nlpl-mttools</pre>
+
<pre>module help nlpl-mttools/20191218</pre>
 
</li>
 
</li>
 
</ul>
 
</ul>
Line 18: Line 18:
 
<li>Tokenization, casing, corpus cleaning and evaluation scripts from Moses</li>
 
<li>Tokenization, casing, corpus cleaning and evaluation scripts from Moses</li>
 
<li>Source: https://github.com/moses-smt/mosesdecoder (scripts directory)</li>
 
<li>Source: https://github.com/moses-smt/mosesdecoder (scripts directory)</li>
<li>Installed revision: 413ba6b</li>
+
<li>Installed revision: a89691f</li>
 
<li>The subfolders <code>generic</code>, <code>recaser</code>, <code>tokenizer</code>, <code>training</code> are in PATH</li>
 
<li>The subfolders <code>generic</code>, <code>recaser</code>, <code>tokenizer</code>, <code>training</code> are in PATH</li>
 
</ul>
 
</ul>
Line 25: Line 25:
 
<li>Python port of Moses tokenizer and truecaser</li>
 
<li>Python port of Moses tokenizer and truecaser</li>
 
<li>Source: https://github.com/alvations/sacremoses</li>
 
<li>Source: https://github.com/alvations/sacremoses</li>
<li>Installed version: 0.0.5</li>
+
<li>Installed version: 0.0.35</li>
 
</ul>
 
</ul>
 
<li>'''subword-nmt'''</li>
 
<li>'''subword-nmt'''</li>
Line 31: Line 31:
 
<li>Unsupervised Word Segmentation (a.k.a. Byte Pair Encoding) for Machine Translation and Text Generation</li>
 
<li>Unsupervised Word Segmentation (a.k.a. Byte Pair Encoding) for Machine Translation and Text Generation</li>
 
<li>Source: https://github.com/rsennrich/subword-nmt</li>
 
<li>Source: https://github.com/rsennrich/subword-nmt</li>
<li>Installed version: 0.3.6</li>
+
<li>Installed version: 0.3.7</li>
 
<li>The <code>subword-nmt</code> executable is in PATH</li>
 
<li>The <code>subword-nmt</code> executable is in PATH</li>
 
</ul>
 
</ul>
Line 38: Line 38:
 
<li>Unsupervised text tokenizer for Neural Network-based text generation</li>
 
<li>Unsupervised text tokenizer for Neural Network-based text generation</li>
 
<li>Source: https://github.com/google/sentencepiece</li>
 
<li>Source: https://github.com/google/sentencepiece</li>
<li>Installed version: 0.1.6</li>
+
<li>Installed version: 0.1.85</li>
 
<li>The <code>spm_*</code> executables are in PATH</li>
 
<li>The <code>spm_*</code> executables are in PATH</li>
 
</ul>
 
</ul>
Line 45: Line 45:
 
<li>Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons</li>
 
<li>Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons</li>
 
<li>Source: https://github.com/mjpost/sacreBLEU</li>
 
<li>Source: https://github.com/mjpost/sacreBLEU</li>
<li>Installed version: 1.2.12</li>
+
<li>Installed version: 1.4.3</li>
 
<li>The <code>sacrebleu</code> executable is in PATH</li>
 
<li>The <code>sacrebleu</code> executable is in PATH</li>
 
</ul>
 
</ul>
<li>'''scoring'''</li>
+
<li>'''multeval'''</li>
 
<ul>
 
<ul>
<li>Script that makes it easy to score machine translation output using NIST's BLEU and NIST, TER, and METEOR, by Ken Heafield</li>
+
<li>Tool to evaluate machine translation with various scores (BLEU, TER, METEOR) and to perform statistical significance testing with bootstrap resampling</li>
<li>Source: https://kheafield.com/code/scoring.tar.gz</li>
+
<li>Source: https://github.com/jhclark/multeval</li>
<li>Installed version: Sept 19, 2012</li>
+
<li>Installed version: 0.5.1 with METEOR 1.5</li>
<li>The <code>score.rb</code> script is in PATH</li>
+
<li>The multeval.sh script is in PATH</li>
 +
</ul>
 +
<li>'''compare-mt'''</li>
 +
<ul>
 +
<li>Compare the output of multiple systems for language generation, including machine translation, summarization, dialog response generation. Computes common evaluation scores and runs analyses to find salient differences between the systems.</li>
 +
<li>To run METEOR, consult the help <code>module spider nlpl-mttools</code> for the exact path.</li>
 +
<li>Source: https://github.com/neulab/compare-mt</li>
 +
<li>Installed version: 0.2.7</li>
 +
<li>The compare-mt executable is in PATH</li>
 
</ul>
 
</ul>
</li>
 
 
</ul>
 
</ul>
  

Revision as of 14:22, 18 December 2019

Using the mttools module

  • Activate the NLPL software repository and load the module:
    module use -a /projappl/nlpl/software/modules/etc         # Puhti
    module use -a /cluster/shared/nlpl/software/modules/etc   # Saga
    module load nlpl-mttools/
  • Module-specific help is available by typing:
    module help nlpl-mttools/20191218

The following scripts are part of this module:

  • moses-scripts
    • Tokenization, casing, corpus cleaning and evaluation scripts from Moses
    • Source: https://github.com/moses-smt/mosesdecoder (scripts directory)
    • Installed revision: a89691f
    • The subfolders generic, recaser, tokenizer, training are in PATH
  • sacremoses
  • subword-nmt
    • Unsupervised Word Segmentation (a.k.a. Byte Pair Encoding) for Machine Translation and Text Generation
    • Source: https://github.com/rsennrich/subword-nmt
    • Installed version: 0.3.7
    • The subword-nmt executable is in PATH
  • sentencepiece
  • sacreBLEU
    • Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
    • Source: https://github.com/mjpost/sacreBLEU
    • Installed version: 1.4.3
    • The sacrebleu executable is in PATH
  • multeval
    • Tool to evaluate machine translation with various scores (BLEU, TER, METEOR) and to perform statistical significance testing with bootstrap resampling
    • Source: https://github.com/jhclark/multeval
    • Installed version: 0.5.1 with METEOR 1.5
    • The multeval.sh script is in PATH
  • compare-mt
    • Compare the output of multiple systems for language generation, including machine translation, summarization, dialog response generation. Computes common evaluation scores and runs analyses to find salient differences between the systems.
    • To run METEOR, consult the help module spider nlpl-mttools for the exact path.
    • Source: https://github.com/neulab/compare-mt
    • Installed version: 0.2.7
    • The compare-mt executable is in PATH


Contact: Yves Scherrer, University of Helsinki, firstname.lastname@helsinki.fi