Translation/mttools
Using the mttools module
- Log into Taito or Abel
- Activate the NLPL software repository and load the module:
module use -a /proj*/nlpl/software/modulefiles/ module load nlpl-mttools
- Module-specific help is available by typing:
module help nlpl-mttools
The following scripts are part of this module:
- moses-scripts
- Tokenization, casing, corpus cleaning and evaluation scripts from Moses
- Source: https://github.com/moses-smt/mosesdecoder (scripts directory)
- Installed revision: 413ba6b
- The subfolders
generic
,recaser
,tokenizer
,training
are in PATH - sacremoses
- Python port of Moses tokenizer and truecaser
- Source: https://github.com/alvations/sacremoses
- Installed version: 0.0.5
- subword-nmt
- Unsupervised Word Segmentation (a.k.a. Byte Pair Encoding) for Machine Translation and Text Generation
- Source: https://github.com/rsennrich/subword-nmt
- Installed version: 0.3.6
- The
subword-nmt
executable is in PATH - sentencepiece
- Unsupervised text tokenizer for Neural Network-based text generation
- Source: https://github.com/google/sentencepiece
- Installed version: 0.1.6
- The
spm_*
executables are in PATH - sacreBLEU
- Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
- Source: https://github.com/mjpost/sacreBLEU
- Installed version: 1.2.12
- The
sacrebleu
executable is in PATH - scoring
- Script that makes it easy to score machine translation output using NIST's BLEU and NIST, TER, and METEOR, by Ken Heafield
- Source: https://kheafield.com/code/scoring.tar.gz
- Installed version: Sept 19, 2012
- The
score.rb
script is in PATH
Contact:
Yves Scherrer, University of Helsinki, firstname.lastname@helsinki.fi