Translation/mttools

From Nordic Language Processing Laboratory
Revision as of 13:51, 3 February 2019 by Yvessche (talk | contribs)
Jump to: navigation, search

Using the mttools module

  • Log into Taito or Abel
  • Activate the NLPL software repository and load the module:
    module use -a /proj*/nlpl/software/modulefiles/
    module load nlpl-mttools
  • Module-specific help is available by typing:
    module help nlpl-mttools

The following scripts are part of this module:

  • moses-scripts
    • Tokenization, casing, corpus cleaning and evaluation scripts from Moses
    • Source: https://github.com/moses-smt/mosesdecoder (scripts directory)
    • Installed revision: 413ba6b
    • The subfolders generic, recaser, tokenizer, training are in PATH
  • sacremoses
  • subword-nmt
    • Unsupervised Word Segmentation (a.k.a. Byte Pair Encoding) for Machine Translation and Text Generation
    • Source: https://github.com/rsennrich/subword-nmt
    • Installed version: 0.3.6
    • The subword-nmt executable is in PATH
  • sentencepiece
  • sacreBLEU
    • Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
    • Source: https://github.com/mjpost/sacreBLEU
    • Installed version: 1.2.12
    • The sacrebleu executable is in PATH
  • scoring
    • Script that makes it easy to score machine translation output using NIST's BLEU and NIST, TER, and METEOR, by Ken Heafield
    • Source: https://kheafield.com/code/scoring.tar.gz
    • Installed version: Sept 19, 2012
    • The score.rb script is in PATH


Contact: Yves Scherrer, University of Helsinki, firstname.lastname@helsinki.fi