Difference between revisions of "Parsing/udpipe"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Using UDPipe on Abel)
(Running UDPipe)
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= UDPipe =
 
= UDPipe =
  
UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared tasks on universal dependency parsing in 2017 and 2018.
+
UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. This page only contains a brief introduction, for the full capability of UDPipe, please see the official  [http://ufal.mff.cuni.cz/udpipe UDPipe Web page] including the [http://ufal.mff.cuni.cz/udpipe/users-manual UDPipe User's Manual] and
  
== Using UDPipe on Abel ==
+
== Using UDPipe on Puhti and Saga ==
  
UDPipe is available as a module on Abel. It was installed as part of the OPUS activity.
+
UDPipe is available as a module on Puhti and Saga. It was installed as part of the OPUS activity.
  
How to use UDPipe:
+
How to use UDPipe on Saga:
* Log into Abel
+
* Log into Saga
 
* Activate the NLPL module repository:
 
* Activate the NLPL module repository:
  module use -a /projects/nlpl/software/modulefiles/
+
  module use -a /custer/shared/nlpl/software/modules/etc
 
* Load the most recent version of the udpipe module:
 
* Load the most recent version of the udpipe module:
  module load nlpl-udpipe
+
  module load nlpl-udpipe/1.2.1-devel
  
To learn more about using UDPipe, check the official [UDPipe User's Manual](http://ufal.mff.cuni.cz/udpipe/users-manual)
+
How to use UDPipe on Puhti:
 +
* Log into Puhti
 +
* Activate the NLPL module repository:
 +
module use -a /projappl/nlpl/software/modules/etc
 +
* Load the most recent version of the udpipe module:
 +
module load nlpl-udpipe/1.2.1-devel
 +
 
 +
== Pre-trained models ==
 +
 
 +
There are pre-trained models available for all languages in Universal Dependencies 2.4:
 +
/cluster/shared/nlpl/software/modules/udpipe/latest/models (Saga)
 +
/projappl/nlpl/software/modules/udpipe/latest/models (Puhti)
 +
 
 +
== Running UDPipe ==
 +
 
 +
To run UDPipe on raw text (in TESTFILE) run the following command:
 +
udpipe --tokenize --tag --parse MODEL_DIR/MODEL TESTFILE
 +
 
 +
where MODEL_DIR is specified above, and MODEL is the model for the language (treebank) in question, e.g. swedich_talbanken-ud-2.4-190531.udpipe, for Swedish, tained on Talbanken.
 +
 
 +
The above command performs segmentation, tokenization, POS-tagging and parsing. If only a subset of these tasks are needed, remove some of the flags. The output file is in CoNLLU format as default.
 +
 
 +
If you want to parse with uuparser, run the following command (--tag is optional, depending on if you want POS-tags). Then you can run uuparser on the resulting file, for parsing.
 +
udpipe --tokenize --tag MODEL_DIR/MODEL TESTFILE
 +
 
 +
UDPipe can handle several input and output formats and other variations. It is also possible to train new models. See the official manual for more details.

Latest revision as of 21:03, 14 January 2020

UDPipe

UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. This page only contains a brief introduction, for the full capability of UDPipe, please see the official UDPipe Web page including the UDPipe User's Manual and

Using UDPipe on Puhti and Saga

UDPipe is available as a module on Puhti and Saga. It was installed as part of the OPUS activity.

How to use UDPipe on Saga:

  • Log into Saga
  • Activate the NLPL module repository:
module use -a /custer/shared/nlpl/software/modules/etc
  • Load the most recent version of the udpipe module:
module load nlpl-udpipe/1.2.1-devel

How to use UDPipe on Puhti:

  • Log into Puhti
  • Activate the NLPL module repository:
module use -a /projappl/nlpl/software/modules/etc
  • Load the most recent version of the udpipe module:
module load nlpl-udpipe/1.2.1-devel

Pre-trained models

There are pre-trained models available for all languages in Universal Dependencies 2.4:

/cluster/shared/nlpl/software/modules/udpipe/latest/models (Saga)
/projappl/nlpl/software/modules/udpipe/latest/models (Puhti)

Running UDPipe

To run UDPipe on raw text (in TESTFILE) run the following command:

udpipe --tokenize --tag --parse MODEL_DIR/MODEL TESTFILE

where MODEL_DIR is specified above, and MODEL is the model for the language (treebank) in question, e.g. swedich_talbanken-ud-2.4-190531.udpipe, for Swedish, tained on Talbanken.

The above command performs segmentation, tokenization, POS-tagging and parsing. If only a subset of these tasks are needed, remove some of the flags. The output file is in CoNLLU format as default.

If you want to parse with uuparser, run the following command (--tag is optional, depending on if you want POS-tags). Then you can run uuparser on the resulting file, for parsing.

udpipe --tokenize --tag MODEL_DIR/MODEL TESTFILE

UDPipe can handle several input and output formats and other variations. It is also possible to train new models. See the official manual for more details.