Difference between revisions of "Parsing/home"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Parsing Systems)
 
(22 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
= Background =
 
= Background =
  
An experimentation environment for data-driven dependency parsing
+
An experimentation environment for data-driven dependency parsing is maintained for NLPL under the coordination of Uppsala University (UU).
is maintained for NLPL under the coordination of Uppsala University (UU).
+
The data is available on the Norwegian Saga cluster and on the Finnish Puhti cluster.
Initially, the software and data are commissioned on the Norwegian Abel supercluster.
+
The software is available on the Norwegian Saga cluster
  
= Using the Uppsala Parser =
+
Initially, software and data were commissioned on the Norwegian Abel supercluster, see [http://wiki.nlpl.eu/index.php/Parsing/abel The Abel page] for legacy information.
  
* Log into Abel
+
= Preprocessing Tools =
* Activate the NLPL module repository:
 
module use -a /projects/nlpl/software/modulefiles/
 
* Load the most recent version of the uuparser module:
 
module load uuparser
 
  
'''Train a parsing model'''
+
* [http://wiki.nlpl.eu/index.php/Parsing/udpipe UDPipe]
  
To train a set of parsing models on treebanks from Universal Dependencies (v2.1):
+
Additionally, a variety of tools for sentence splitting, tokenization, lemmatization, et al.
 +
are available through the NLPL installations of the
 +
[http://nltk.org Natural Language Processing Toolkit (NLTK)] and the
 +
[https://spacy.io spaCy: Natural Language Processing in Python] tools.
  
uuparser --include [languages to include denoted by their ISO id] --outdir my-output-directory
+
= Parsing Systems =
  
for example
+
* [http://wiki.nlpl.eu/index.php/Parsing/uuparser The Uppsala Parser]
 +
* [http://wiki.nlpl.eu/index.php/Parsing/udpipe UDPipe]
 +
* [http://wiki.nlpl.eu/index.php/Parsing/turboparser TurboParser]
  
uuparser --include "sv en ru" --outdir ~/experiments
 
  
will train separate models for UD Swedish, English and Russian and store the results in the ''experiments'' folder in your home directory.
+
Additionallly, parsers are available in several toolkits installed by nlpl: [http://wiki.nlpl.eu/index.php/Parsing/stanfordnlp StanfordNLP], [https://www.nltk.org/ NLTK], [https://spacy.io/ spaCy].
Model selection is included in the training process by default; that is, at each epoch the current model is evaluated on the UD dev data, and at the end of training the best performing model for each language is selected.  
 
  
'''Predicting with a pre-trained parsing model'''
+
= Training and Evaluation Data =
  
To predict on UD test data with the models trained above:
+
* [http://wiki.nlpl.eu/index.php/Parsing/ud Universal Dependencies v2.0–2.5]
 
+
* [http://wiki.nlpl.eu/index.php/Parsing/sdp Semantic Dependency Parsing]
uuparser --include "sv en ru" --outdir ~/experiments --predict
 
 
 
'''Contact:'''
 
Aaron Smith, Uppsala University, firstname.lastname@lingfil.uu.se
 

Latest revision as of 07:18, 15 January 2020

Background

An experimentation environment for data-driven dependency parsing is maintained for NLPL under the coordination of Uppsala University (UU). The data is available on the Norwegian Saga cluster and on the Finnish Puhti cluster. The software is available on the Norwegian Saga cluster

Initially, software and data were commissioned on the Norwegian Abel supercluster, see The Abel page for legacy information.

Preprocessing Tools

Additionally, a variety of tools for sentence splitting, tokenization, lemmatization, et al. are available through the NLPL installations of the Natural Language Processing Toolkit (NLTK) and the spaCy: Natural Language Processing in Python tools.

Parsing Systems


Additionallly, parsers are available in several toolkits installed by nlpl: StanfordNLP, NLTK, spaCy.

Training and Evaluation Data