Parsing/udpipe

From Nordic Language Processing Laboratory
Revision as of 22:03, 14 January 2020 by Sara (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

UDPipe

UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. This page only contains a brief introduction, for the full capability of UDPipe, please see the official UDPipe Web page including the UDPipe User's Manual and

Using UDPipe on Puhti and Saga

UDPipe is available as a module on Puhti and Saga. It was installed as part of the OPUS activity.

How to use UDPipe on Saga:

  • Log into Saga
  • Activate the NLPL module repository:
module use -a /custer/shared/nlpl/software/modules/etc
  • Load the most recent version of the udpipe module:
module load nlpl-udpipe/1.2.1-devel

How to use UDPipe on Puhti:

  • Log into Puhti
  • Activate the NLPL module repository:
module use -a /projappl/nlpl/software/modules/etc
  • Load the most recent version of the udpipe module:
module load nlpl-udpipe/1.2.1-devel

Pre-trained models

There are pre-trained models available for all languages in Universal Dependencies 2.4:

/cluster/shared/nlpl/software/modules/udpipe/latest/models (Saga)
/projappl/nlpl/software/modules/udpipe/latest/models (Puhti)

Running UDPipe

To run UDPipe on raw text (in TESTFILE) run the following command:

udpipe --tokenize --tag --parse MODEL_DIR/MODEL TESTFILE

where MODEL_DIR is specified above, and MODEL is the model for the language (treebank) in question, e.g. swedich_talbanken-ud-2.4-190531.udpipe, for Swedish, tained on Talbanken.

The above command performs segmentation, tokenization, POS-tagging and parsing. If only a subset of these tasks are needed, remove some of the flags. The output file is in CoNLLU format as default.

If you want to parse with uuparser, run the following command (--tag is optional, depending on if you want POS-tags). Then you can run uuparser on the resulting file, for parsing.

udpipe --tokenize --tag MODEL_DIR/MODEL TESTFILE

UDPipe can handle several input and output formats and other variations. It is also possible to train new models. See the official manual for more details.

Personal tools
Namespaces

Variants
Actions
Navigation
Tools