Latest revision as of 12:15, 14 January 2020

TurboParser

TurboParser is a fast and accurate pre-neural dependency parser with linear programming. The package also contains a POS tagger, a semantic role labeler, a entity tagger, a coreference resolver, and a constituent (phrase-based) parser. For full documentation, see http://www.cs.cmu.edu/~ark/TurboParser/ and https://github.com/andre-martins/TurboParser. This document will only describe how to use the tagger and parser.

Using TurboParser on Saga

Log into Saga
Activate the NLPL module repository:

module use -a /cluster/shared/nlpl/software/modules/etc

Load the most recent version of the uuparser module:

module load nlpl-turboparser/2.3.0

Data formats and conversion

TurboParser takes CoNLL-X files as input. Most dcurrent data, including the universal dependencies data available on Saga is in CoNLLU format. If you want to use such data, you first need to convert between these formats, using the provided script (this script comes from Universal dependenices tools by Dan Zeman, and is included in the TurboParser module):

conllu_to_conllx.pl < INPUT_FILE.conllu > OUTPUT_FILE.conll

TurboTagger accepts data in native format with one word per line, and two columns, the words and the tags. There is a script for conversion between CoNLL-X and this format:

create_tagging_corpus.sh INPUT_FILE.conll

which will create the file "INPUT_FILE.conll.tagging"

Using the parser

To train a parsing models on a treebank:

TurboParser --train --file_train=$res_dir/TRAINING_DATA.conll --file_model=MODEL --logtostderr

To predict using the model trained above:

TurboParser --test --evaluate --file_model=MODEL --file_test=TEST_INPUT_FILE --file_prediction=RESULT_FILE --logtostderr

This command writes the aprsed data in RESULT_FILE, and prints the accuracy (since the --evaluate flag is given)

The parser has numerous options to allow you to fine-control its behaviour. For a full list, type

TurboParser --help

Using the tagger

To train a tagging model on a treebank:

TurboTagger --train --file_train=TRAINING_DATA.conll.tagging --file_model=MODEL --form_cutoff=1 --logtostderr

To predict using the model trained above:

TurboTagger --test --evaluate --file_model=MODEL --file_test=TEST_INPUT_FILE --file_prediction=RESULT_FILE --logtostderr

This command writes the tagged data in RESULT_FILE, and prints the accuracy (since the --evaluate flag is given)

The tagger has numerous options to allow you to fine-control its behaviour. For a full list, type

TurboTagger --help

Segmentation

In the above examples, we assume pre-segmented input data already in the appropriate format (see above). If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using module load nlpl-udpipe and then run by typing udpipe at the command line, see UDPipe

Difference between revisions of "Parsing/turboparser"

Latest revision as of 12:15, 14 January 2020

Contents

TurboParser

Using TurboParser on Saga

Data formats and conversion

Using the parser

Using the tagger

Segmentation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools