http://wiki.nlpl.eu/index.php?title=Parsing/turboparser&feed=atom&action=historyParsing/turboparser - Revision history2024-03-29T11:08:27ZRevision history for this page on the wikiMediaWiki 1.31.10http://wiki.nlpl.eu/index.php?title=Parsing/turboparser&diff=898&oldid=prevSara: Created page with "= TurboParser = TurboParser is a fast and accurate pre-neural dependency parser with linear programming. The package also contains a POS tagger, a semantic role labeler, a en..."2020-01-14T12:15:57Z<p>Created page with "= TurboParser = TurboParser is a fast and accurate pre-neural dependency parser with linear programming. The package also contains a POS tagger, a semantic role labeler, a en..."</p>
<p><b>New page</b></p><div>= TurboParser =<br />
<br />
TurboParser is a fast and accurate pre-neural dependency parser with linear programming. The package also contains a POS tagger, a semantic role labeler, a entity tagger, a coreference resolver, and a constituent (phrase-based) parser. For full documentation, see [http://www.cs.cmu.edu/~ark/TurboParser/ http://www.cs.cmu.edu/~ark/TurboParser/] and [https://github.com/andre-martins/TurboParser https://github.com/andre-martins/TurboParser]. This document will only describe how to use the tagger and parser.<br />
<br />
== Using TurboParser on Saga ==<br />
<br />
* Log into Saga<br />
* Activate the NLPL module repository:<br />
module use -a /cluster/shared/nlpl/software/modules/etc<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-turboparser/2.3.0<br />
<br />
<br />
== Data formats and conversion ==<br />
<br />
TurboParser takes CoNLL-X files as input. Most dcurrent data, including the universal dependencies data available on Saga is in CoNLLU format. If you want to use such data, you first need to convert between these formats, using the provided script (this script comes from Universal dependenices tools by Dan Zeman, and is included in the TurboParser module):<br />
<br />
conllu_to_conllx.pl < INPUT_FILE.conllu > OUTPUT_FILE.conll<br />
<br />
TurboTagger accepts data in native format with one word per line, and two columns, the words and the tags. There is a script for conversion between CoNLL-X and this format:<br />
<br />
create_tagging_corpus.sh INPUT_FILE.conll<br />
<br />
which will create the file "INPUT_FILE.conll.tagging"<br />
<br />
<br />
== Using the parser ==<br />
<br />
To train a parsing models on a treebank:<br />
<br />
TurboParser --train --file_train=$res_dir/TRAINING_DATA.conll --file_model=MODEL --logtostderr<br />
<br />
<br />
To predict using the model trained above:<br />
<br />
TurboParser --test --evaluate --file_model=MODEL --file_test=TEST_INPUT_FILE --file_prediction=RESULT_FILE --logtostderr <br />
<br />
This command writes the aprsed data in RESULT_FILE, and prints the accuracy (since the --evaluate flag is given)<br />
<br />
The parser has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
TurboParser --help <br />
<br />
<br />
== Using the tagger ==<br />
<br />
To train a tagging model on a treebank:<br />
<br />
TurboTagger --train --file_train=TRAINING_DATA.conll.tagging --file_model=MODEL --form_cutoff=1 --logtostderr<br />
<br />
To predict using the model trained above:<br />
<br />
TurboTagger --test --evaluate --file_model=MODEL --file_test=TEST_INPUT_FILE --file_prediction=RESULT_FILE --logtostderr <br />
<br />
This command writes the tagged data in RESULT_FILE, and prints the accuracy (since the --evaluate flag is given)<br />
<br />
The tagger has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
TurboTagger --help <br />
<br />
== Segmentation ==<br />
<br />
In the above examples, we assume pre-segmented input data already in the appropriate format (see above). If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using <code>module load nlpl-udpipe</code> and then run by typing <code>udpipe</code> at the command line, see [http://wiki.nlpl.eu/index.php/Parsing/udpipe UDPipe]</div>Sara