Difference between revisions of "Translation/mttools"
(→Using the mttools module) |
|||
Line 2: | Line 2: | ||
<ul> | <ul> | ||
− | |||
<li>Activate the NLPL software repository and load the module: | <li>Activate the NLPL software repository and load the module: | ||
− | <pre>module use -a / | + | <pre>module use -a /projappl/nlpl/software/modules/etc # Puhti |
− | module load nlpl-mttools</pre> | + | module use -a /cluster/shared/nlpl/software/modules/etc # Saga |
+ | module load nlpl-mttools/</pre> | ||
</li> | </li> | ||
<li>Module-specific help is available by typing: | <li>Module-specific help is available by typing: | ||
− | <pre>module help nlpl-mttools</pre> | + | <pre>module help nlpl-mttools/20191218</pre> |
</li> | </li> | ||
</ul> | </ul> | ||
Line 18: | Line 18: | ||
<li>Tokenization, casing, corpus cleaning and evaluation scripts from Moses</li> | <li>Tokenization, casing, corpus cleaning and evaluation scripts from Moses</li> | ||
<li>Source: https://github.com/moses-smt/mosesdecoder (scripts directory)</li> | <li>Source: https://github.com/moses-smt/mosesdecoder (scripts directory)</li> | ||
− | <li>Installed revision: | + | <li>Installed revision: a89691f</li> |
<li>The subfolders <code>generic</code>, <code>recaser</code>, <code>tokenizer</code>, <code>training</code> are in PATH</li> | <li>The subfolders <code>generic</code>, <code>recaser</code>, <code>tokenizer</code>, <code>training</code> are in PATH</li> | ||
</ul> | </ul> | ||
Line 25: | Line 25: | ||
<li>Python port of Moses tokenizer and truecaser</li> | <li>Python port of Moses tokenizer and truecaser</li> | ||
<li>Source: https://github.com/alvations/sacremoses</li> | <li>Source: https://github.com/alvations/sacremoses</li> | ||
− | <li>Installed version: 0.0. | + | <li>Installed version: 0.0.35</li> |
</ul> | </ul> | ||
<li>'''subword-nmt'''</li> | <li>'''subword-nmt'''</li> | ||
Line 31: | Line 31: | ||
<li>Unsupervised Word Segmentation (a.k.a. Byte Pair Encoding) for Machine Translation and Text Generation</li> | <li>Unsupervised Word Segmentation (a.k.a. Byte Pair Encoding) for Machine Translation and Text Generation</li> | ||
<li>Source: https://github.com/rsennrich/subword-nmt</li> | <li>Source: https://github.com/rsennrich/subword-nmt</li> | ||
− | <li>Installed version: 0.3. | + | <li>Installed version: 0.3.7</li> |
<li>The <code>subword-nmt</code> executable is in PATH</li> | <li>The <code>subword-nmt</code> executable is in PATH</li> | ||
</ul> | </ul> | ||
Line 38: | Line 38: | ||
<li>Unsupervised text tokenizer for Neural Network-based text generation</li> | <li>Unsupervised text tokenizer for Neural Network-based text generation</li> | ||
<li>Source: https://github.com/google/sentencepiece</li> | <li>Source: https://github.com/google/sentencepiece</li> | ||
− | <li>Installed version: 0.1. | + | <li>Installed version: 0.1.85</li> |
<li>The <code>spm_*</code> executables are in PATH</li> | <li>The <code>spm_*</code> executables are in PATH</li> | ||
</ul> | </ul> | ||
Line 45: | Line 45: | ||
<li>Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons</li> | <li>Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons</li> | ||
<li>Source: https://github.com/mjpost/sacreBLEU</li> | <li>Source: https://github.com/mjpost/sacreBLEU</li> | ||
− | <li>Installed version: 1. | + | <li>Installed version: 1.4.3</li> |
<li>The <code>sacrebleu</code> executable is in PATH</li> | <li>The <code>sacrebleu</code> executable is in PATH</li> | ||
</ul> | </ul> | ||
− | <li>''' | + | <li>'''multeval'''</li> |
<ul> | <ul> | ||
− | <li> | + | <li>Tool to evaluate machine translation with various scores (BLEU, TER, METEOR) and to perform statistical significance testing with bootstrap resampling</li> |
− | <li>Source: https:// | + | <li>Source: https://github.com/jhclark/multeval</li> |
− | <li>Installed version: | + | <li>Installed version: 0.5.1 with METEOR 1.5</li> |
− | <li> | + | <li>The multeval.sh script is in PATH</li> |
+ | </ul> | ||
+ | <li>'''compare-mt'''</li> | ||
+ | <ul> | ||
+ | <li>Compare the output of multiple systems for language generation, including machine translation, summarization, dialog response generation. Computes common evaluation scores and runs analyses to find salient differences between the systems.</li> | ||
+ | <li>To run METEOR, consult the help <code>module spider nlpl-mttools</code> for the exact path.</li> | ||
+ | <li>Source: https://github.com/neulab/compare-mt</li> | ||
+ | <li>Installed version: 0.2.7</li> | ||
+ | <li>The compare-mt executable is in PATH</li> | ||
</ul> | </ul> | ||
− | |||
</ul> | </ul> | ||
Revision as of 14:22, 18 December 2019
Using the mttools module
- Activate the NLPL software repository and load the module:
module use -a /projappl/nlpl/software/modules/etc # Puhti module use -a /cluster/shared/nlpl/software/modules/etc # Saga module load nlpl-mttools/
- Module-specific help is available by typing:
module help nlpl-mttools/20191218
The following scripts are part of this module:
- moses-scripts
- Tokenization, casing, corpus cleaning and evaluation scripts from Moses
- Source: https://github.com/moses-smt/mosesdecoder (scripts directory)
- Installed revision: a89691f
- The subfolders
generic
,recaser
,tokenizer
,training
are in PATH - sacremoses
- Python port of Moses tokenizer and truecaser
- Source: https://github.com/alvations/sacremoses
- Installed version: 0.0.35
- subword-nmt
- Unsupervised Word Segmentation (a.k.a. Byte Pair Encoding) for Machine Translation and Text Generation
- Source: https://github.com/rsennrich/subword-nmt
- Installed version: 0.3.7
- The
subword-nmt
executable is in PATH - sentencepiece
- Unsupervised text tokenizer for Neural Network-based text generation
- Source: https://github.com/google/sentencepiece
- Installed version: 0.1.85
- The
spm_*
executables are in PATH - sacreBLEU
- Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
- Source: https://github.com/mjpost/sacreBLEU
- Installed version: 1.4.3
- The
sacrebleu
executable is in PATH - multeval
- Tool to evaluate machine translation with various scores (BLEU, TER, METEOR) and to perform statistical significance testing with bootstrap resampling
- Source: https://github.com/jhclark/multeval
- Installed version: 0.5.1 with METEOR 1.5
- The multeval.sh script is in PATH
- compare-mt
- Compare the output of multiple systems for language generation, including machine translation, summarization, dialog response generation. Computes common evaluation scores and runs analyses to find salient differences between the systems.
- To run METEOR, consult the help
module spider nlpl-mttools
for the exact path. - Source: https://github.com/neulab/compare-mt
- Installed version: 0.2.7
- The compare-mt executable is in PATH
Contact:
Yves Scherrer, University of Helsinki, firstname.lastname@helsinki.fi