Difference between revisions of "Parsing/udpipe"
(→Using UDPipe on Abel) |
(→Running UDPipe) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= UDPipe = | = UDPipe = | ||
− | UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. | + | UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. This page only contains a brief introduction, for the full capability of UDPipe, please see the official [http://ufal.mff.cuni.cz/udpipe UDPipe Web page] including the [http://ufal.mff.cuni.cz/udpipe/users-manual UDPipe User's Manual] and |
− | == Using UDPipe on | + | == Using UDPipe on Puhti and Saga == |
− | UDPipe is available as a module on | + | UDPipe is available as a module on Puhti and Saga. It was installed as part of the OPUS activity. |
− | How to use UDPipe: | + | How to use UDPipe on Saga: |
− | * Log into | + | * Log into Saga |
* Activate the NLPL module repository: | * Activate the NLPL module repository: | ||
− | module use -a / | + | module use -a /custer/shared/nlpl/software/modules/etc |
* Load the most recent version of the udpipe module: | * Load the most recent version of the udpipe module: | ||
− | module load nlpl-udpipe | + | module load nlpl-udpipe/1.2.1-devel |
− | + | How to use UDPipe on Puhti: | |
+ | * Log into Puhti | ||
+ | * Activate the NLPL module repository: | ||
+ | module use -a /projappl/nlpl/software/modules/etc | ||
+ | * Load the most recent version of the udpipe module: | ||
+ | module load nlpl-udpipe/1.2.1-devel | ||
+ | |||
+ | == Pre-trained models == | ||
+ | |||
+ | There are pre-trained models available for all languages in Universal Dependencies 2.4: | ||
+ | /cluster/shared/nlpl/software/modules/udpipe/latest/models (Saga) | ||
+ | /projappl/nlpl/software/modules/udpipe/latest/models (Puhti) | ||
+ | |||
+ | == Running UDPipe == | ||
+ | |||
+ | To run UDPipe on raw text (in TESTFILE) run the following command: | ||
+ | udpipe --tokenize --tag --parse MODEL_DIR/MODEL TESTFILE | ||
+ | |||
+ | where MODEL_DIR is specified above, and MODEL is the model for the language (treebank) in question, e.g. swedich_talbanken-ud-2.4-190531.udpipe, for Swedish, tained on Talbanken. | ||
+ | |||
+ | The above command performs segmentation, tokenization, POS-tagging and parsing. If only a subset of these tasks are needed, remove some of the flags. The output file is in CoNLLU format as default. | ||
+ | |||
+ | If you want to parse with uuparser, run the following command (--tag is optional, depending on if you want POS-tags). Then you can run uuparser on the resulting file, for parsing. | ||
+ | udpipe --tokenize --tag MODEL_DIR/MODEL TESTFILE | ||
+ | |||
+ | UDPipe can handle several input and output formats and other variations. It is also possible to train new models. See the official manual for more details. |
Latest revision as of 21:03, 14 January 2020
UDPipe
UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. This page only contains a brief introduction, for the full capability of UDPipe, please see the official UDPipe Web page including the UDPipe User's Manual and
Using UDPipe on Puhti and Saga
UDPipe is available as a module on Puhti and Saga. It was installed as part of the OPUS activity.
How to use UDPipe on Saga:
- Log into Saga
- Activate the NLPL module repository:
module use -a /custer/shared/nlpl/software/modules/etc
- Load the most recent version of the udpipe module:
module load nlpl-udpipe/1.2.1-devel
How to use UDPipe on Puhti:
- Log into Puhti
- Activate the NLPL module repository:
module use -a /projappl/nlpl/software/modules/etc
- Load the most recent version of the udpipe module:
module load nlpl-udpipe/1.2.1-devel
Pre-trained models
There are pre-trained models available for all languages in Universal Dependencies 2.4:
/cluster/shared/nlpl/software/modules/udpipe/latest/models (Saga) /projappl/nlpl/software/modules/udpipe/latest/models (Puhti)
Running UDPipe
To run UDPipe on raw text (in TESTFILE) run the following command:
udpipe --tokenize --tag --parse MODEL_DIR/MODEL TESTFILE
where MODEL_DIR is specified above, and MODEL is the model for the language (treebank) in question, e.g. swedich_talbanken-ud-2.4-190531.udpipe, for Swedish, tained on Talbanken.
The above command performs segmentation, tokenization, POS-tagging and parsing. If only a subset of these tasks are needed, remove some of the flags. The output file is in CoNLLU format as default.
If you want to parse with uuparser, run the following command (--tag is optional, depending on if you want POS-tags). Then you can run uuparser on the resulting file, for parsing.
udpipe --tokenize --tag MODEL_DIR/MODEL TESTFILE
UDPipe can handle several input and output formats and other variations. It is also possible to train new models. See the official manual for more details.