Difference between revisions of "Translation/mttools"
(→Using the mttools module) |
|||
Line 8: | Line 8: | ||
</li> | </li> | ||
<li>Module-specific help is available by typing: | <li>Module-specific help is available by typing: | ||
− | <pre>module help nlpl-mttools | + | <pre>module help nlpl-mttools</pre> |
</li> | </li> | ||
</ul> | </ul> | ||
Line 18: | Line 18: | ||
<li>Tokenization, casing, corpus cleaning and evaluation scripts from Moses</li> | <li>Tokenization, casing, corpus cleaning and evaluation scripts from Moses</li> | ||
<li>Source: https://github.com/moses-smt/mosesdecoder (scripts directory)</li> | <li>Source: https://github.com/moses-smt/mosesdecoder (scripts directory)</li> | ||
− | <li>Installed revision: | + | <li>Installed revision: 3990724</li> |
<li>The subfolders <code>generic</code>, <code>recaser</code>, <code>tokenizer</code>, <code>training</code> are in PATH</li> | <li>The subfolders <code>generic</code>, <code>recaser</code>, <code>tokenizer</code>, <code>training</code> are in PATH</li> | ||
</ul> | </ul> | ||
Line 31: | Line 31: | ||
<li>Unsupervised Word Segmentation (a.k.a. Byte Pair Encoding) for Machine Translation and Text Generation</li> | <li>Unsupervised Word Segmentation (a.k.a. Byte Pair Encoding) for Machine Translation and Text Generation</li> | ||
<li>Source: https://github.com/rsennrich/subword-nmt</li> | <li>Source: https://github.com/rsennrich/subword-nmt</li> | ||
− | <li>Installed version: 0.3. | + | <li>Installed version: 0.3.8</li> |
<li>The <code>subword-nmt</code> executable is in PATH</li> | <li>The <code>subword-nmt</code> executable is in PATH</li> | ||
</ul> | </ul> | ||
Line 38: | Line 38: | ||
<li>Unsupervised text tokenizer for Neural Network-based text generation</li> | <li>Unsupervised text tokenizer for Neural Network-based text generation</li> | ||
<li>Source: https://github.com/google/sentencepiece</li> | <li>Source: https://github.com/google/sentencepiece</li> | ||
− | <li>Installed version: 0.1. | + | <li>Installed version: 0.1.97</li> |
<li>The <code>spm_*</code> executables are in PATH</li> | <li>The <code>spm_*</code> executables are in PATH</li> | ||
</ul> | </ul> | ||
Line 45: | Line 45: | ||
<li>Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons</li> | <li>Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons</li> | ||
<li>Source: https://github.com/mjpost/sacreBLEU</li> | <li>Source: https://github.com/mjpost/sacreBLEU</li> | ||
− | <li>Installed version: | + | <li>Installed version: 2.2.1</li> |
<li>The <code>sacrebleu</code> executable is in PATH</li> | <li>The <code>sacrebleu</code> executable is in PATH</li> | ||
</ul> | </ul> | ||
Line 60: | Line 60: | ||
<li>To run METEOR, consult the module-specific help page for the exact path.</li> | <li>To run METEOR, consult the module-specific help page for the exact path.</li> | ||
<li>Source: https://github.com/neulab/compare-mt</li> | <li>Source: https://github.com/neulab/compare-mt</li> | ||
− | <li>Installed version: 0.2. | + | <li>Installed version: 0.2.10</li> |
<li>The compare-mt executable is in PATH</li> | <li>The compare-mt executable is in PATH</li> | ||
</ul> | </ul> |
Latest revision as of 11:55, 21 October 2022
Using the mttools module
- Activate the NLPL software repository and load the module:
module use -a /projappl/nlpl/software/modules/etc # Puhti module use -a /cluster/shared/nlpl/software/modules/etc # Saga module load nlpl-mttools/
- Module-specific help is available by typing:
module help nlpl-mttools
The following scripts are part of this module:
- moses-scripts
- Tokenization, casing, corpus cleaning and evaluation scripts from Moses
- Source: https://github.com/moses-smt/mosesdecoder (scripts directory)
- Installed revision: 3990724
- The subfolders
generic
,recaser
,tokenizer
,training
are in PATH - sacremoses
- Python port of Moses tokenizer and truecaser
- Source: https://github.com/alvations/sacremoses
- Installed version: 0.0.35
- subword-nmt
- Unsupervised Word Segmentation (a.k.a. Byte Pair Encoding) for Machine Translation and Text Generation
- Source: https://github.com/rsennrich/subword-nmt
- Installed version: 0.3.8
- The
subword-nmt
executable is in PATH - sentencepiece
- Unsupervised text tokenizer for Neural Network-based text generation
- Source: https://github.com/google/sentencepiece
- Installed version: 0.1.97
- The
spm_*
executables are in PATH - sacreBLEU
- Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
- Source: https://github.com/mjpost/sacreBLEU
- Installed version: 2.2.1
- The
sacrebleu
executable is in PATH - multeval
- Tool to evaluate machine translation with various scores (BLEU, TER, METEOR) and to perform statistical significance testing with bootstrap resampling
- Source: https://github.com/jhclark/multeval
- Installed version: 0.5.1 with METEOR 1.5
- The multeval.sh script is in PATH
- compare-mt
- Compare the output of multiple systems for language generation, including machine translation, summarization, dialog response generation. Computes common evaluation scores and runs analyses to find salient differences between the systems.
- To run METEOR, consult the module-specific help page for the exact path.
- Source: https://github.com/neulab/compare-mt
- Installed version: 0.2.10
- The compare-mt executable is in PATH
Contact:
Yves Scherrer, University of Helsinki, firstname.lastname@helsinki.fi