Difference between revisions of "Parsing/udpipe"

From Nordic Language Processing Laboratory
Jump to: navigation, search
Line 20: Line 20:
 
* Load the most recent version of the udpipe module:
 
* Load the most recent version of the udpipe module:
 
  module load nlpl-udpipe/1.2.1-devel
 
  module load nlpl-udpipe/1.2.1-devel
 +
 +
== Pre-trained models ==
 +
 +
There are pre-trained models available for all languages in Universal Dependencies 2.4:
 +
/cluster/shared/nlpl/software/modules/udpipe/latest/models (Saga)
 +
/projappl/nlpl/software/modules/udpipe/latest/models (Puhti)
 +
 +
== Running UDPipe ==
 +
 +
To run UDPipe on raw text (in TESTFILE) run the following command:
 +
udpipe --tokenize --tag --parse MODEL_DIR/MODEL TESTFILE
 +
 +
where MODEL_DIR is specified above, and MODEL is the model for the language (treebank) in question, e.g. swedich_talbanken-ud-2.4-190531.udpipe, for Swedish, tained on Talbanken.
 +
 +
The above command performs segmentation, tokenization, POS-tagging and parsing. If only a subset of these tasks are needed, remove some of the flags. The output file is in CoNLLU format as default.
 +
 +
UDPipe can handle several input and output formats and other variations. It is also possible to train new models. See the official manual for more details.

Revision as of 20:51, 14 January 2020

UDPipe

UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. This page only contains a brief introduction, for the full capability of UDPipe, please see the official UDPipe Web page including the UDPipe User's Manual and

Using UDPipe on Puhti and Saga

UDPipe is available as a module on Puhti and Saga. It was installed as part of the OPUS activity.

How to use UDPipe on Saga:

  • Log into Saga
  • Activate the NLPL module repository:
module use -a /custer/shared/nlpl/software/modules/etc
  • Load the most recent version of the udpipe module:
module load nlpl-udpipe/1.2.1-devel

How to use UDPipe on Puhti:

  • Log into Puhti
  • Activate the NLPL module repository:
module use -a /projappl/nlpl/software/modules/etc
  • Load the most recent version of the udpipe module:
module load nlpl-udpipe/1.2.1-devel

Pre-trained models

There are pre-trained models available for all languages in Universal Dependencies 2.4:

/cluster/shared/nlpl/software/modules/udpipe/latest/models (Saga)
/projappl/nlpl/software/modules/udpipe/latest/models (Puhti)

Running UDPipe

To run UDPipe on raw text (in TESTFILE) run the following command:

udpipe --tokenize --tag --parse MODEL_DIR/MODEL TESTFILE

where MODEL_DIR is specified above, and MODEL is the model for the language (treebank) in question, e.g. swedich_talbanken-ud-2.4-190531.udpipe, for Swedish, tained on Talbanken.

The above command performs segmentation, tokenization, POS-tagging and parsing. If only a subset of these tasks are needed, remove some of the flags. The output file is in CoNLLU format as default.

UDPipe can handle several input and output formats and other variations. It is also possible to train new models. See the official manual for more details.