Parsing/udpipe

From Nordic Language Processing Laboratory
(Difference between revisions)
Jump to: navigation, search
(Running UDPipe)
 
Line 35: Line 35:
  
 
The above command performs segmentation, tokenization, POS-tagging and parsing. If only a subset of these tasks are needed, remove some of the flags. The output file is in CoNLLU format as default.
 
The above command performs segmentation, tokenization, POS-tagging and parsing. If only a subset of these tasks are needed, remove some of the flags. The output file is in CoNLLU format as default.
 +
 +
If you want to parse with uuparser, run the following command (--tag is optional, depending on if you want POS-tags). Then you can run uuparser on the resulting file, for parsing.
 +
udpipe --tokenize --tag MODEL_DIR/MODEL TESTFILE
  
 
UDPipe can handle several input and output formats and other variations. It is also possible to train new models. See the official manual for more details.
 
UDPipe can handle several input and output formats and other variations. It is also possible to train new models. See the official manual for more details.

Latest revision as of 23:03, 14 January 2020

Contents

[edit] UDPipe

UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. This page only contains a brief introduction, for the full capability of UDPipe, please see the official UDPipe Web page including the UDPipe User's Manual and

[edit] Using UDPipe on Puhti and Saga

UDPipe is available as a module on Puhti and Saga. It was installed as part of the OPUS activity.

How to use UDPipe on Saga:

  • Log into Saga
  • Activate the NLPL module repository:
module use -a /custer/shared/nlpl/software/modules/etc
  • Load the most recent version of the udpipe module:
module load nlpl-udpipe/1.2.1-devel

How to use UDPipe on Puhti:

  • Log into Puhti
  • Activate the NLPL module repository:
module use -a /projappl/nlpl/software/modules/etc
  • Load the most recent version of the udpipe module:
module load nlpl-udpipe/1.2.1-devel

[edit] Pre-trained models

There are pre-trained models available for all languages in Universal Dependencies 2.4:

/cluster/shared/nlpl/software/modules/udpipe/latest/models (Saga)
/projappl/nlpl/software/modules/udpipe/latest/models (Puhti)

[edit] Running UDPipe

To run UDPipe on raw text (in TESTFILE) run the following command:

udpipe --tokenize --tag --parse MODEL_DIR/MODEL TESTFILE

where MODEL_DIR is specified above, and MODEL is the model for the language (treebank) in question, e.g. swedich_talbanken-ud-2.4-190531.udpipe, for Swedish, tained on Talbanken.

The above command performs segmentation, tokenization, POS-tagging and parsing. If only a subset of these tasks are needed, remove some of the flags. The output file is in CoNLLU format as default.

If you want to parse with uuparser, run the following command (--tag is optional, depending on if you want POS-tags). Then you can run uuparser on the resulting file, for parsing.

udpipe --tokenize --tag MODEL_DIR/MODEL TESTFILE

UDPipe can handle several input and output formats and other variations. It is also possible to train new models. See the official manual for more details.

Personal tools
Namespaces

Variants
Actions
Navigation
Tools