http://wiki.nlpl.eu/api.php?action=feedcontributions&user=Nivre&feedformat=atomNordic Language Processing Laboratory - User contributions [en]2024-03-28T19:45:00ZUser contributionsMediaWiki 1.31.10http://wiki.nlpl.eu/index.php?title=Community/workshop/2019/program&diff=731Community/workshop/2019/program2019-09-13T10:43:41Z<p>Nivre: /* The First NLPL Workshop on Deep Learning for Natural Language Processing */</p>
<hr />
<div>= The First NLPL Workshop on Deep Learning for Natural Language Processing =<br />
<br />
<br />
{| class="wikitable"<br />
|-<br />
!colspan=3|Monday, September 30, 2019<br />
|-<br />
| 09:00 || 09:20 || '''Opening''' <br />
|-<br />
| || || '''Session 1''' Chair: Leon Derczynski<br />
|-<br />
| 09:20 || 09:40 || Timothee Mickus, Denis Paperno and Matthieu Constant: <i>Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling</i><br />
|-<br />
| 09:40 || 10:00 || Robin Kurtz, Daniel Roxbo and Marco Kuhlmann: <i>Improving Semantic Dependency Parsing with Syntactic Features</i><br />
|-<br />
| 10:00 || 10:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Keynote 1''' Chair: Jörg Tiedemann <br />
|-<br />
| 10:30 || 11:30 || Barbara Plank: <i>Deep Transfer Learning: Learning across Languages, Modalities and Tasks</i><br />
|-<br />
| || || '''Session 2''' Chair: Sara Stymne <br />
|-<br />
| 11:30 || 11:50 || Andrey Kutuzov and Elizaveta Kuzmenko: <i>To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation</i><br />
|-<br />
| 11:50 || 12:10 || Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski and Filip Ginter: <i>Is Multilingual BERT Fluent in Language Generation?</i><br />
|-<br />
| 12:10 || 12:30 || Vinit Ravishankar, Memduh Gökırmak, Lilja Øvrelid and Erik Velldal: <i>Multilingual Probing of Contextual Sentence Encoders</i><br />
|-<br />
| 12:30 || 14:00 || '''Lunch Break'''<br />
|-<br />
| || || '''Keynote 2''' Chair: Filip Ginter <br />
|-<br />
| 14:00 || 15:00 || Jussi Karlgren: <i>High-Dimensional Semantic Spaces and the Squinting Linguist</i><br />
|-<br />
| 15:00 || 15:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Session 3''' Chair: Lilja Øvrelid <br />
|-<br />
| 15.30 || 15:50 || Nicolaj Filrup Rasmussen, Kristian Nørgaard Jensen, Marco Placenti and Thai Wang: <i>Cross-Domain Sentiment Classification using Vector Embedded Domain Representations</i><br />
|-<br />
| 15:50 || 16:10 || Matthias Damaschk, Tillmann Dönicke and Florian Lux: <i>Multiclass Text Classification on Unbalanced, Sparse and Noisy Data</i><br />
|-<br />
| 16:10 || 16:30 || '''Discussion and Closing'''<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop/2019/program&diff=730Community/workshop/2019/program2019-09-09T12:15:22Z<p>Nivre: /* The First NLPL Workshop on Deep Learning for Natural Language Processing */</p>
<hr />
<div>= The First NLPL Workshop on Deep Learning for Natural Language Processing =<br />
<br />
<br />
{| class="wikitable"<br />
|-<br />
!colspan=3|Monday, September 30, 2019<br />
|-<br />
| 09:00 || 09:20 || '''Opening''' <br />
|-<br />
| || || '''Session 1''' Chair: Leon Derczynski<br />
|-<br />
| 09:20 || 09:40 || Timothee Mickus, Denis Paperno and Matthieu Constant: <i>Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling</i><br />
|-<br />
| 09:40 || 10:00 || Robin Kurtz, Daniel Roxbo and Marco Kuhlmann: <i>Improving Semantic Dependency Parsing with Syntactic Features</i><br />
|-<br />
| 10:00 || 10:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Keynote 1''' Chair: Jörg Tiedemann <br />
|-<br />
| 10:30 || 11:30 || Barbara Plank: <i>Multi-Task and Continual Learning for Multi-Modal NLP</i><br />
|-<br />
| || || '''Session 2''' Chair: Sara Stymne <br />
|-<br />
| 11:30 || 11:50 || Andrey Kutuzov and Elizaveta Kuzmenko: <i>To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation</i><br />
|-<br />
| 11:50 || 12:10 || Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski and Filip Ginter: <i>Is Multilingual BERT Fluent in Language Generation?</i><br />
|-<br />
| 12:10 || 12:30 || Vinit Ravishankar, Memduh Gökırmak, Lilja Øvrelid and Erik Velldal: <i>Multilingual Probing of Contextual Sentence Encoders</i><br />
|-<br />
| 12:30 || 14:00 || '''Lunch Break'''<br />
|-<br />
| || || '''Keynote 2''' Chair: Filip Ginter <br />
|-<br />
| 14:00 || 15:00 || Jussi Karlgren: <i>High-Dimensional Semantic Spaces and the Squinting Linguist</i><br />
|-<br />
| 15:00 || 15:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Session 3''' Chair: Lilja Øvrelid <br />
|-<br />
| 15.30 || 15:50 || Nicolaj Filrup Rasmussen, Kristian Nørgaard Jensen, Marco Placenti and Thai Wang: <i>Cross-Domain Sentiment Classification using Vector Embedded Domain Representations</i><br />
|-<br />
| 15:50 || 16:10 || Matthias Damaschk, Tillmann Dönicke and Florian Lux: <i>Multiclass Text Classification on Unbalanced, Sparse and Noisy Data</i><br />
|-<br />
| 16:10 || 16:30 || '''Discussion and Closing'''<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop/2019/program&diff=729Community/workshop/2019/program2019-09-09T08:00:51Z<p>Nivre: /* The First NLPL Workshop on Deep Learning for Natural Language Processing */</p>
<hr />
<div>= The First NLPL Workshop on Deep Learning for Natural Language Processing =<br />
<br />
<br />
{| class="wikitable"<br />
|-<br />
!colspan=3|Monday, September 30, 2019<br />
|-<br />
| 09:00 || 09:20 || '''Opening''' <br />
|-<br />
| || || '''Session 1''' Chair: Leon Derczynski<br />
|-<br />
| 09:20 || 09:40 || Timothee Mickus, Denis Paperno and Matthieu Constant: <i>Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling</i><br />
|-<br />
| 09:40 || 10:00 || Robin Kurtz, Daniel Roxbo and Marco Kuhlmann: <i>Improving Semantic Dependency Parsing with Syntactic Features</i><br />
|-<br />
| 10:00 || 10:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Keynote 1''' Chair: TBA <br />
|-<br />
| 10:30 || 11:30 || Barbara Plank: <i>Multi-Task and Continual Learning for Multi-Modal NLP</i><br />
|-<br />
| || || '''Session 2''' Chair: Sara Stymne <br />
|-<br />
| 11:30 || 11:50 || Andrey Kutuzov and Elizaveta Kuzmenko: <i>To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation</i><br />
|-<br />
| 11:50 || 12:10 || Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski and Filip Ginter: <i>Is Multilingual BERT Fluent in Language Generation?</i><br />
|-<br />
| 12:10 || 12:30 || Vinit Ravishankar, Memduh Gökırmak, Lilja Øvrelid and Erik Velldal: <i>Multilingual Probing of Contextual Sentence Encoders</i><br />
|-<br />
| 12:30 || 14:00 || '''Lunch Break'''<br />
|-<br />
| || || '''Keynote 2''' Chair: Filip Ginter <br />
|-<br />
| 14:00 || 15:00 || Jussi Karlgren: <i>High-Dimensional Semantic Spaces and the Squinting Linguist</i><br />
|-<br />
| 15:00 || 15:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Session 3''' Chair: Lilja Øvrelid <br />
|-<br />
| 15.30 || 15:50 || Nicolaj Filrup Rasmussen, Kristian Nørgaard Jensen, Marco Placenti and Thai Wang: <i>Cross-Domain Sentiment Classification using Vector Embedded Domain Representations</i><br />
|-<br />
| 15:50 || 16:10 || Matthias Damaschk, Tillmann Dönicke and Florian Lux: <i>Multiclass Text Classification on Unbalanced, Sparse and Noisy Data</i><br />
|-<br />
| 16:10 || 16:30 || '''Discussion and Closing'''<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop/2019/program&diff=728Community/workshop/2019/program2019-09-09T07:24:19Z<p>Nivre: /* The First NLPL Workshop on Deep Learning for Natural Language Processing */</p>
<hr />
<div>= The First NLPL Workshop on Deep Learning for Natural Language Processing =<br />
<br />
<br />
{| class="wikitable"<br />
|-<br />
!colspan=3|Monday, September 30, 2019<br />
|-<br />
| 09:00 || 09:20 || '''Opening''' <br />
|-<br />
| || || '''Session 1''' Chair: Leon Derczynski<br />
|-<br />
| 09:20 || 09:40 || Timothee Mickus, Denis Paperno and Matthieu Constant: <i>Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling</i><br />
|-<br />
| 09:40 || 10:00 || Robin Kurtz, Daniel Roxbo and Marco Kuhlmann: <i>Improving Semantic Dependency Parsing with Syntactic Features</i><br />
|-<br />
| 10:00 || 10:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Keynote 1''' Chair: TBA <br />
|-<br />
| 10:30 || 11:30 || Barbara Plank: <i>Multi-Task and Continual Learning for Multi-Modal NLP</i><br />
|-<br />
| || || '''Session 2''' Chair: Sara Stymne <br />
|-<br />
| 11:30 || 11:50 || Andrey Kutuzov and Elizaveta Kuzmenko: <i>To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation</i><br />
|-<br />
| 11:50 || 12:10 || Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski and Filip Ginter: <i>Is Multilingual BERT Fluent in Language Generation?</i><br />
|-<br />
| 12:10 || 12:30 || Vinit Ravishankar, Memduh Gökırmak, Lilja Øvrelid and Erik Velldal: <i>Multilingual Probing of Contextual Sentence Encoders</i><br />
|-<br />
| 12:30 || 14:00 || '''Lunch Break'''<br />
|-<br />
| || || '''Keynote 2''' Chair: Filip Ginter <br />
|-<br />
| 14:00 || 15:00 || Jussi Karlgren: <i>High-Dimensional Semantic Spaces and the Squinting Linguist</i><br />
|-<br />
| 15:00 || 15:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Session 3''' Chair: TBA <br />
|-<br />
| 15.30 || 15:50 || Nicolaj Filrup Rasmussen, Kristian Nørgaard Jensen, Marco Placenti and Thai Wang: <i>Cross-Domain Sentiment Classification using Vector Embedded Domain Representations</i><br />
|-<br />
| 15:50 || 16:10 || Matthias Damaschk, Tillmann Dönicke and Florian Lux: <i>Multiclass Text Classification on Unbalanced, Sparse and Noisy Data</i><br />
|-<br />
| 16:10 || 16:30 || '''Discussion and Closing'''<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop/2019/program&diff=727Community/workshop/2019/program2019-09-09T07:00:18Z<p>Nivre: /* The First NLPL Workshop on Deep Learning for Natural Language Processing */</p>
<hr />
<div>= The First NLPL Workshop on Deep Learning for Natural Language Processing =<br />
<br />
<br />
{| class="wikitable"<br />
|-<br />
!colspan=3|Monday, September 30, 2019<br />
|-<br />
| 09:00 || 09:20 || '''Opening''' <br />
|-<br />
| || || '''Session 1''' Chair: Leon Derczynski<br />
|-<br />
| 09:20 || 09:40 || Timothee Mickus, Denis Paperno and Matthieu Constant: <i>Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling</i><br />
|-<br />
| 09:40 || 10:00 || Robin Kurtz, Daniel Roxbo and Marco Kuhlmann: <i>Improving Semantic Dependency Parsing with Syntactic Features</i><br />
|-<br />
| 10:00 || 10:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Keynote 1''' Chair: TBA <br />
|-<br />
| 10:30 || 11:30 || Barbara Plank: <i>Multi-Task and Continual Learning for Multi-Modal NLP</i><br />
|-<br />
| || || '''Session 2''' Chair: Sara Stymne <br />
|-<br />
| 11:30 || 11:50 || Andrey Kutuzov and Elizaveta Kuzmenko: <i>To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation</i><br />
|-<br />
| 11:50 || 12:10 || Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski and Filip Ginter: <i>Is Multilingual BERT Fluent in Language Generation?</i><br />
|-<br />
| 12:10 || 12:30 || Vinit Ravishankar, Memduh Gökırmak, Lilja Øvrelid and Erik Velldal: <i>Multilingual Probing of Contextual Sentence Encoders</i><br />
|-<br />
| 12:30 || 14:00 || '''Lunch Break'''<br />
|-<br />
| || || '''Keynote 2''' Chair: TBA <br />
|-<br />
| 14:00 || 15:00 || Jussi Karlgren: <i>High-Dimensional Semantic Spaces and the Squinting Linguist</i><br />
|-<br />
| 15:00 || 15:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Session 3''' Chair: Lilja Øvrelid <br />
|-<br />
| 15.30 || 15:50 || Nicolaj Filrup Rasmussen, Kristian Nørgaard Jensen, Marco Placenti and Thai Wang: <i>Cross-Domain Sentiment Classification using Vector Embedded Domain Representations</i><br />
|-<br />
| 15:50 || 16:10 || Matthias Damaschk, Tillmann Dönicke and Florian Lux: <i>Multiclass Text Classification on Unbalanced, Sparse and Noisy Data</i><br />
|-<br />
| 16:10 || 16:30 || '''Discussion and Closing'''<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop/2019/program&diff=726Community/workshop/2019/program2019-09-09T06:58:45Z<p>Nivre: /* The First NLPL Workshop on Deep Learning for Natural Language Processing */</p>
<hr />
<div>= The First NLPL Workshop on Deep Learning for Natural Language Processing =<br />
<br />
<br />
{| class="wikitable"<br />
|-<br />
!colspan=3|Monday, September 30, 2019<br />
|-<br />
| 09:00 || 09:20 || '''Opening''' <br />
|-<br />
| || || '''Session 1''' Chair: TBA <br />
|-<br />
| 09:20 || 09:40 || Timothee Mickus, Denis Paperno and Matthieu Constant: <i>Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling</i><br />
|-<br />
| 09:40 || 10:00 || Robin Kurtz, Daniel Roxbo and Marco Kuhlmann: <i>Improving Semantic Dependency Parsing with Syntactic Features</i><br />
|-<br />
| 10:00 || 10:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Keynote 1''' Chair: TBA <br />
|-<br />
| 10:30 || 11:30 || Barbara Plank: <i>Multi-Task and Continual Learning for Multi-Modal NLP</i><br />
|-<br />
| || || '''Session 2''' Chair: TBA <br />
|-<br />
| 11:30 || 11:50 || Andrey Kutuzov and Elizaveta Kuzmenko: <i>To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation</i><br />
|-<br />
| 11:50 || 12:10 || Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski and Filip Ginter: <i>Is Multilingual BERT Fluent in Language Generation?</i><br />
|-<br />
| 12:10 || 12:30 || Vinit Ravishankar, Memduh Gökırmak, Lilja Øvrelid and Erik Velldal: <i>Multilingual Probing of Contextual Sentence Encoders</i><br />
|-<br />
| 12:30 || 14:00 || '''Lunch Break'''<br />
|-<br />
| || || '''Keynote 2''' Chair: TBA <br />
|-<br />
| 14:00 || 15:00 || Jussi Karlgren: <i>High-Dimensional Semantic Spaces and the Squinting Linguist</i><br />
|-<br />
| 15:00 || 15:30 || '''Coffee Break'''<br />
|-<br />
| || || '''Session 3''' Chair: TBA <br />
|-<br />
| 15.30 || 15:50 || Nicolaj Filrup Rasmussen, Kristian Nørgaard Jensen, Marco Placenti and Thai Wang: <i>Cross-Domain Sentiment Classification using Vector Embedded Domain Representations</i><br />
|-<br />
| 15:50 || 16:10 || Matthias Damaschk, Tillmann Dönicke and Florian Lux: <i>Multiclass Text Classification on Unbalanced, Sparse and Noisy Data</i><br />
|-<br />
| 16:10 || 16:30 || '''Discussion and Closing'''<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop/2019/program&diff=722Community/workshop/2019/program2019-09-07T14:50:59Z<p>Nivre: /* The First NLPL Workshop on Deep Learning for Natural Language Processing */</p>
<hr />
<div>= The First NLPL Workshop on Deep Learning for Natural Language Processing =<br />
<br />
<br />
{| class="wikitable"<br />
|-<br />
!colspan=3|Monday, September 30, 2019<br />
|-<br />
| 09:00 || 09:20 || '''Opening''' <br />
|-<br />
| 09:20 || 09:40 || Timothee Mickus, Denis Paperno and Matthieu Constant: <i>Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling</i><br />
<br />
|-<br />
| 09:40 || 10:00 || Robin Kurtz, Daniel Roxbo and Marco Kuhlmann: <i>Improving Semantic Dependency Parsing with Syntactic Features</i><br />
|-<br />
| 10:00 || 10:30 || '''Coffee Break'''<br />
|-<br />
| 10:30 || 11:30 || '''Keynote 1''' Barbara Plank: <i>Multi-Task and Continual Learning for Multi-Modal NLP</i><br />
|-<br />
| 11:30 || 11:50 || Andrey Kutuzov and Elizaveta Kuzmenko: <i>To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation</i><br />
|-<br />
| 11:50 || 12:10 || Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski and Filip Ginter: <i>Is Multilingual BERT Fluent in Language Generation?</i><br />
|-<br />
| 12:10 || 12:30 || Vinit Ravishankar, Memduh Gökırmak, Lilja Øvrelid and Erik Velldal: <i>Multilingual Probing of Contextual Sentence Encoders</i><br />
|-<br />
| 12:30 || 14:00 || '''Lunch Break'''<br />
|-<br />
| 14:00 || 15:00 || '''Keynote 2''' Jussi Karlgren: <i>High-Dimensional Semantic Spaces and the Squinting Linguist</i><br />
|-<br />
| 15:00 || 15:30 || '''Coffee Break'''<br />
|-<br />
| 15.30 || 15:50 || Nicolaj Filrup Rasmussen, Kristian Nørgaard Jensen, Marco Placenti and Thai Wang: <i>Cross-Domain Sentiment Classification using Vector Embedded Domain Representations</i><br />
|-<br />
| 15:50 || 16:10 || Matthias Damaschk, Tillmann Dönicke and Florian Lux: <i>Multiclass Text Classification on Unbalanced, Sparse and Noisy Data</i><br />
|-<br />
| 16:10 || 16:30 || '''Discussion and Closing'''<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop&diff=721Community/workshop2019-09-07T14:42:12Z<p>Nivre: </p>
<hr />
<div>The First NLPL Workshop on ''Deep Learning for Natural Language Processing''<br />
will be held on September 30, 2019 in Turku, Finland<br />
(co-located with the [https://nodalida2019.org/ NoDaLiDa Conference]).<br />
<br />
'''New:''' [http://wiki.nlpl.eu/index.php/Community/workshop/2019/program Workshop Program].<br />
<br />
= Topic and Goals =<br />
<br />
The use of deep neural networks and related techniques has led to major improvements on many NLP tasks and has in a short time profoundly changed the research landscape in our field. Making adequate use of these techniques, however, not only requires new technical knowledge on the part of researchers but also presupposes access to large-scale computing resources often with dedicated, specialized hardware. Keeping up with the development is therefore a challenge, especially for smaller research groups in academic environments, which are common in the Nordic countries.<br />
<br />
The Nordic Language Processing Laboratory (NLPL) is a collaboration of university research groups in the Nordic countries, with support from the Nordic e-Infrastructure Collaboration (NeIC). The goal of NLPL is to create a virtual laboratory for data- and compute-intensive NLP research based on a common software, data and service stack in multiple Nordic HPC centers, to pool competencies within the user community and among expert support teams, and thereby to enable internationally competitive, data-intensive research and experimentation on a scale that would be difficult to sustain for individual research groups on commodity computing resources.<br />
<br />
The First NLPL Workshop on Deep Learning for Natural Language Processing seeks to facilitate platform building and knowledge exchange among NLP researchers in the Nordic countries. The goal of the workshop is to support NLP research based on deep learning or other techniques that require high-performance computing and to provide a forum for researchers in the Nordic countries to exchange ideas and discuss ongoing research in the area. A special focus of the workshop will be how to enable competitive research given the limitations of available resources and how to ease entry into the field for new researchers. Consequently, the workshop will not be limited solely to methodological and resource description papers, but will welcome all kinds of contributions addressing the abovementioned focus.<br />
<br />
We invite papers on all relevant topics, including but not limited to:<br />
<br />
* Using deep learning to solve NLP problems<br />
* Making deep learning more usable and accessible for NLP researchers<br />
* Improving our understanding of deep learning models in the context of NLP<br />
* Representation learning for natural language<br />
* Interpretation of neural representations<br />
* Intrinsic and extrinsic evaluation of deep learning models<br />
* Contextualized and multimodal representations<br />
* Multitask learning with deep neural networks<br />
* Deep learning with scarce resources and noisy data<br />
* Transfer learning and multilinguality with deep models<br />
<br />
= Invited Presentations =<br />
<br />
The workshop will feature two keynote speakers:<br />
<br />
* [https://bplank.github.io/ Barbara Plank], IT University of Copenhagen (Denmark)<br />
* [https://www.kth.se/profile/jussi Jussi Karlgren], Gavagai and KTH (Sweden)<br />
<br />
= Important Dates =<br />
<br />
* Paper submission: August 16, 2019<br />
* Notification of acceptance: August 26, 2019<br />
* Final workshop papers due: September 6, 2019<br />
* Workshop: September 30, 2019<br />
<br />
= Paper Submission =<br />
<br />
All submissions are expected to have a maximum of 8 pages plus a list of references not restricted in length. Note that we do not have separate categories for long and short papers. All submissions should follow the official Nodalida 2019 format templates. The submissions are expected to be blind and follow the [https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines ACL Author Guidelines]. Parallel submission to another forum is permitted providing that the organizers are informed without delay, should the author choose to present the work at the other venue and withdraw it from this workshop. At least one author of each accepted paper must register to attend the workshop.<br />
<br />
The submission site is: https://easychair.org/conferences/?conf=dl4nlp<br />
<br />
= Programme Committee =<br />
<br />
The workshop is co-organized by members of the NLPL team,<br />
with<br />
[https://cl.lingfil.uu.se/~nivre/ Joakim Nivre] serving as the chair of the<br />
[http://wiki.nlpl.eu/index.php/Community/workshop/2019/committee Programme Committee].</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop/2019/program&diff=720Community/workshop/2019/program2019-09-07T14:38:55Z<p>Nivre: /* Program */</p>
<hr />
<div>= The First NLPL Workshop on Deep Learning for Natural Language Processing =<br />
<br />
<br />
{| class="wikitable"<br />
|-<br />
!colspan=3|Monday, September 30, 2019<br />
|-<br />
| 09:00 || 09:20 || '''Opening''' <br />
|-<br />
| 09:20 || 09:40 || Timothee Mickus, Denis Paperno and Matthieu Constant: <i>Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling</i><br />
<br />
|-<br />
| 09:40 || 10:00 || Robin Kurtz, Daniel Roxbo and Marco Kuhlmann: <i>Improving Semantic Dependency Parsing with Syntactic Features</i><br />
|-<br />
| 10:00 || 10:30 || '''Coffee Break'''<br />
|-<br />
| 10:30 || 11:30 || '''Keynote 1''' Barbara Plank<br />
|-<br />
| 11:30 || 11:50 || Andrey Kutuzov and Elizaveta Kuzmenko: <i>To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation</i><br />
|-<br />
| 11:50 || 12:10 || Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski and Filip Ginter: <i>Is Multilingual BERT Fluent in Language Generation?</i><br />
|-<br />
| 12:10 || 12:30 || Vinit Ravishankar, Memduh Gökırmak, Lilja Øvrelid and Erik Velldal: <i>Multilingual Probing of Contextual Sentence Encoders</i><br />
|-<br />
| 12:30 || 14:00 || '''Lunch Break'''<br />
|-<br />
| 14:00 || 15:00 || '''Keynote 2''' Jussi Karlgren<br />
|-<br />
| 15:00 || 15:30 || '''Coffee Break'''<br />
|-<br />
| 15.30 || 15:50 || Nicolaj Filrup Rasmussen, Kristian Nørgaard Jensen, Marco Placenti and Thai Wang: <i>Cross-Domain Sentiment Classification using Vector Embedded Domain Representations</i><br />
|-<br />
| 15:50 || 16:10 || Matthias Damaschk, Tillmann Dönicke and Florian Lux: <i>Multiclass Text Classification on Unbalanced, Sparse and Noisy Data</i><br />
|-<br />
| 16:10 || 16:30 || '''Discussion and Closing'''<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop/2019/program&diff=719Community/workshop/2019/program2019-09-07T14:37:04Z<p>Nivre: /* Program */</p>
<hr />
<div>= The First NLPL Workshop on Deep Learning for Natural Language Processing =<br />
<br />
== Program ==<br />
<br />
{| class="wikitable"<br />
|-<br />
!colspan=3|Monday, September 30, 2019<br />
|-<br />
| 09:00 || 09:20 || '''Opening''' <br />
|-<br />
| 09:20 || 09:40 || Timothee Mickus, Denis Paperno and Matthieu Constant: <i>Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling</i><br />
<br />
|-<br />
| 09:40 || 10:00 || Robin Kurtz, Daniel Roxbo and Marco Kuhlmann: Improving Semantic Dependency Parsing with Syntactic Features<br />
|-<br />
| 10:00 || 10:30 || '''Coffee Break'''<br />
|-<br />
| 10:30 || 11:30 || '''Keynote Lecture''' Barbara Plank<br />
|-<br />
| 11:30 || 11:50 || Andrey Kutuzov and Elizaveta Kuzmenko: To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation<br />
|-<br />
| 11:50 || 12:10 || Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski and Filip Ginter: Is Multilingual BERT Fluent in Language Generation?<br />
|-<br />
| 12:10 || 12:30 || Vinit Ravishankar, Memduh Gökırmak, Lilja Øvrelid and Erik Velldal: Multilingual Probing of Contextual Sentence Encoders<br />
|-<br />
| 12:30 || 14:00 || '''Lunch Break'''<br />
|-<br />
| 14:00 || 15:00 || '''Keynote Lecture''' Jussi Karlgren<br />
|-<br />
| 15:00 || 15:30 || '''Coffee Break'''<br />
|-<br />
| 15.30 || 15:50 || Nicolaj Filrup Rasmussen, Kristian Nørgaard Jensen, Marco Placenti and Thai Wang: Cross-Domain Sentiment Classification using Vector Embedded Domain Representations <br />
|-<br />
| 15:50 || 16:10 || Matthias Damaschk, Tillmann Dönicke and Florian Lux: Multiclass Text Classification on Unbalanced, Sparse and Noisy Data<br />
|-<br />
| 16:10 || 16:30 || '''Concluding Discussion'''<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop/2019/program&diff=718Community/workshop/2019/program2019-09-07T14:30:20Z<p>Nivre: </p>
<hr />
<div>= The First NLPL Workshop on Deep Learning for Natural Language Processing =<br />
<br />
== Program ==<br />
<br />
{| class="wikitable"<br />
|-<br />
!colspan=3|Monday, September 30, 2019<br />
|-<br />
| 09:00 || 09:20 || '''Opening''' <br />
|-<br />
| 09:20 || 09:40 || Matthias Damaschk, Tillmann Dönicke and Florian Lux Multiclass Text Classification on Unbalanced, Sparse and Noisy Data<br />
|-<br />
| 09:40 || 10:00 || Nicolaj Filrup Rasmussen, Kristian Nørgaard Jensen, Marco Placenti and Thai Wang Cross-Domain Sentiment Classification using Vector Embedded Domain Representations <br />
|-<br />
| 10:00 || 10:30 || '''Coffee Break'''<br />
|-<br />
| 10:30 || 11:30 || '''Keynote Lecture''' Barbara Plank<br />
|-<br />
| 11:30 || 11:50 || Andrey Kutuzov and Elizaveta Kuzmenko To lemmatize or not to lemmatize: how word normalisation affects ELMo performance in word sense disambiguation<br />
|-<br />
| 11:50 || 12:10 || Robin Kurtz, Daniel Roxbo and Marco Kuhlmann Improving Semantic Dependency Parsing with Syntactic Features<br />
|-<br />
| 12:10 || 12:30 || Timothee Mickus, Denis Paperno and Matthieu Constant Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling<br />
|-<br />
| 12:30 || 14:00 || '''Lunch Break'''<br />
|-<br />
| 14:00 || 15:00 || '''Keynote Lecture''' Jussi Karlgren<br />
|-<br />
| 15:00 || 15:30 || '''Coffee Break'''<br />
|-<br />
| 15.30 || 15:50 || Vinit Ravishankar, Memduh Gökırmak, Lilja Øvrelid and Erik Velldal Multilingual Probing of Contextual Sentence Encoders<br />
|-<br />
| 15:50 || 16:10 || Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski and Filip Ginter Is Multilingual BERT Fluent in Language Generation?<br />
|-<br />
| 16:10 || 16:30 || '''Concluding Discussion'''<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop/2019/program&diff=717Community/workshop/2019/program2019-09-07T14:20:21Z<p>Nivre: Created page with "= The First NLPL Workshop on Deep Learning for Natural Language Processing = Turku, September 30, 2019 == Program == 09:00"</p>
<hr />
<div>= The First NLPL Workshop on Deep Learning for Natural Language Processing =<br />
<br />
Turku, September 30, 2019<br />
<br />
== Program ==<br />
<br />
09:00</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop&diff=716Community/workshop2019-09-07T14:16:18Z<p>Nivre: /* Invited Presentations */</p>
<hr />
<div>The First NLPL Workshop on ''Deep Learning for Natural Language Processing''<br />
will be held on September 30, 2019 in Turku, Finland<br />
(co-located with the [https://nodalida2019.org/ NoDaLiDa Conference]).<br />
<br />
= Topic and Goals =<br />
<br />
The use of deep neural networks and related techniques has led to major improvements on many NLP tasks and has in a short time profoundly changed the research landscape in our field. Making adequate use of these techniques, however, not only requires new technical knowledge on the part of researchers but also presupposes access to large-scale computing resources often with dedicated, specialized hardware. Keeping up with the development is therefore a challenge, especially for smaller research groups in academic environments, which are common in the Nordic countries.<br />
<br />
The Nordic Language Processing Laboratory (NLPL) is a collaboration of university research groups in the Nordic countries, with support from the Nordic e-Infrastructure Collaboration (NeIC). The goal of NLPL is to create a virtual laboratory for data- and compute-intensive NLP research based on a common software, data and service stack in multiple Nordic HPC centers, to pool competencies within the user community and among expert support teams, and thereby to enable internationally competitive, data-intensive research and experimentation on a scale that would be difficult to sustain for individual research groups on commodity computing resources.<br />
<br />
The First NLPL Workshop on Deep Learning for Natural Language Processing seeks to facilitate platform building and knowledge exchange among NLP researchers in the Nordic countries. The goal of the workshop is to support NLP research based on deep learning or other techniques that require high-performance computing and to provide a forum for researchers in the Nordic countries to exchange ideas and discuss ongoing research in the area. A special focus of the workshop will be how to enable competitive research given the limitations of available resources and how to ease entry into the field for new researchers. Consequently, the workshop will not be limited solely to methodological and resource description papers, but will welcome all kinds of contributions addressing the abovementioned focus.<br />
<br />
We invite papers on all relevant topics, including but not limited to:<br />
<br />
* Using deep learning to solve NLP problems<br />
* Making deep learning more usable and accessible for NLP researchers<br />
* Improving our understanding of deep learning models in the context of NLP<br />
* Representation learning for natural language<br />
* Interpretation of neural representations<br />
* Intrinsic and extrinsic evaluation of deep learning models<br />
* Contextualized and multimodal representations<br />
* Multitask learning with deep neural networks<br />
* Deep learning with scarce resources and noisy data<br />
* Transfer learning and multilinguality with deep models<br />
<br />
= Invited Presentations =<br />
<br />
The workshop will feature two keynote speakers:<br />
<br />
* [https://bplank.github.io/ Barbara Plank], IT University of Copenhagen (Denmark)<br />
* [https://www.kth.se/profile/jussi Jussi Karlgren], Gavagai and KTH (Sweden)<br />
<br />
= Important Dates =<br />
<br />
* Paper submission: August 16, 2019<br />
* Notification of acceptance: August 26, 2019<br />
* Final workshop papers due: September 6, 2019<br />
* Workshop: September 30, 2019<br />
<br />
= Paper Submission =<br />
<br />
All submissions are expected to have a maximum of 8 pages plus a list of references not restricted in length. Note that we do not have separate categories for long and short papers. All submissions should follow the official Nodalida 2019 format templates. The submissions are expected to be blind and follow the [https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines ACL Author Guidelines]. Parallel submission to another forum is permitted providing that the organizers are informed without delay, should the author choose to present the work at the other venue and withdraw it from this workshop. At least one author of each accepted paper must register to attend the workshop.<br />
<br />
The submission site is: https://easychair.org/conferences/?conf=dl4nlp<br />
<br />
= Programme Committee =<br />
<br />
The workshop is co-organized by members of the NLPL team,<br />
with<br />
[https://cl.lingfil.uu.se/~nivre/ Joakim Nivre] serving as the chair of the<br />
[http://wiki.nlpl.eu/index.php/Community/workshop/2019/committee Programme Committee].</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop&diff=715Community/workshop2019-09-07T14:13:52Z<p>Nivre: /* Extended Deadline */</p>
<hr />
<div>The First NLPL Workshop on ''Deep Learning for Natural Language Processing''<br />
will be held on September 30, 2019 in Turku, Finland<br />
(co-located with the [https://nodalida2019.org/ NoDaLiDa Conference]).<br />
<br />
= Topic and Goals =<br />
<br />
The use of deep neural networks and related techniques has led to major improvements on many NLP tasks and has in a short time profoundly changed the research landscape in our field. Making adequate use of these techniques, however, not only requires new technical knowledge on the part of researchers but also presupposes access to large-scale computing resources often with dedicated, specialized hardware. Keeping up with the development is therefore a challenge, especially for smaller research groups in academic environments, which are common in the Nordic countries.<br />
<br />
The Nordic Language Processing Laboratory (NLPL) is a collaboration of university research groups in the Nordic countries, with support from the Nordic e-Infrastructure Collaboration (NeIC). The goal of NLPL is to create a virtual laboratory for data- and compute-intensive NLP research based on a common software, data and service stack in multiple Nordic HPC centers, to pool competencies within the user community and among expert support teams, and thereby to enable internationally competitive, data-intensive research and experimentation on a scale that would be difficult to sustain for individual research groups on commodity computing resources.<br />
<br />
The First NLPL Workshop on Deep Learning for Natural Language Processing seeks to facilitate platform building and knowledge exchange among NLP researchers in the Nordic countries. The goal of the workshop is to support NLP research based on deep learning or other techniques that require high-performance computing and to provide a forum for researchers in the Nordic countries to exchange ideas and discuss ongoing research in the area. A special focus of the workshop will be how to enable competitive research given the limitations of available resources and how to ease entry into the field for new researchers. Consequently, the workshop will not be limited solely to methodological and resource description papers, but will welcome all kinds of contributions addressing the abovementioned focus.<br />
<br />
We invite papers on all relevant topics, including but not limited to:<br />
<br />
* Using deep learning to solve NLP problems<br />
* Making deep learning more usable and accessible for NLP researchers<br />
* Improving our understanding of deep learning models in the context of NLP<br />
* Representation learning for natural language<br />
* Interpretation of neural representations<br />
* Intrinsic and extrinsic evaluation of deep learning models<br />
* Contextualized and multimodal representations<br />
* Multitask learning with deep neural networks<br />
* Deep learning with scarce resources and noisy data<br />
* Transfer learning and multilinguality with deep models<br />
<br />
= Invited Presentations =<br />
<br />
The workshop will feature three keynote presentations by experts in our field,<br />
including:<br />
<br />
* [https://bplank.github.io/ Barbara Plank], IT University of Copenhagen (Denmark)<br />
<br />
= Important Dates =<br />
<br />
* Paper submission: August 16, 2019<br />
* Notification of acceptance: August 26, 2019<br />
* Final workshop papers due: September 6, 2019<br />
* Workshop: September 30, 2019<br />
<br />
= Paper Submission =<br />
<br />
All submissions are expected to have a maximum of 8 pages plus a list of references not restricted in length. Note that we do not have separate categories for long and short papers. All submissions should follow the official Nodalida 2019 format templates. The submissions are expected to be blind and follow the [https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines ACL Author Guidelines]. Parallel submission to another forum is permitted providing that the organizers are informed without delay, should the author choose to present the work at the other venue and withdraw it from this workshop. At least one author of each accepted paper must register to attend the workshop.<br />
<br />
The submission site is: https://easychair.org/conferences/?conf=dl4nlp<br />
<br />
= Programme Committee =<br />
<br />
The workshop is co-organized by members of the NLPL team,<br />
with<br />
[https://cl.lingfil.uu.se/~nivre/ Joakim Nivre] serving as the chair of the<br />
[http://wiki.nlpl.eu/index.php/Community/workshop/2019/committee Programme Committee].</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop&diff=706Community/workshop2019-08-02T11:30:46Z<p>Nivre: </p>
<hr />
<div>The First NLPL Workshop on ''Deep Learning for Natural Language Processing''<br />
will be held on September 30, 2019 in Turku, Finland<br />
(co-located with the [https://nodalida2019.org/ NoDaLiDa Conference]).<br />
<br />
= Extended Deadline =<br />
<br />
The deadline for submission has been extended by two weeks. The new deadline is August 16, 2019 (midnight anywhere in the world).<br />
<br />
= Topic and Goals =<br />
<br />
The use of deep neural networks and related techniques has led to major improvements on many NLP tasks and has in a short time profoundly changed the research landscape in our field. Making adequate use of these techniques, however, not only requires new technical knowledge on the part of researchers but also presupposes access to large-scale computing resources often with dedicated, specialized hardware. Keeping up with the development is therefore a challenge, especially for smaller research groups in academic environments, which are common in the Nordic countries.<br />
<br />
The Nordic Language Processing Laboratory (NLPL) is a collaboration of university research groups in the Nordic countries, with support from the Nordic e-Infrastructure Collaboration (NeIC). The goal of NLPL is to create a virtual laboratory for data- and compute-intensive NLP research based on a common software, data and service stack in multiple Nordic HPC centers, to pool competencies within the user community and among expert support teams, and thereby to enable internationally competitive, data-intensive research and experimentation on a scale that would be difficult to sustain for individual research groups on commodity computing resources.<br />
<br />
The First NLPL Workshop on Deep Learning for Natural Language Processing seeks to facilitate platform building and knowledge exchange among NLP researchers in the Nordic countries. The goal of the workshop is to support NLP research based on deep learning or other techniques that require high-performance computing and to provide a forum for researchers in the Nordic countries to exchange ideas and discuss ongoing research in the area. A special focus of the workshop will be how to enable competitive research given the limitations of available resources and how to ease entry into the field for new researchers. Consequently, the workshop will not be limited solely to methodological and resource description papers, but will welcome all kinds of contributions addressing the abovementioned focus.<br />
<br />
We invite papers on all relevant topics, including but not limited to:<br />
<br />
* Using deep learning to solve NLP problems<br />
* Making deep learning more usable and accessible for NLP researchers<br />
* Improving our understanding of deep learning models in the context of NLP<br />
* Representation learning for natural language<br />
* Interpretation of neural representations<br />
* Intrinsic and extrinsic evaluation of deep learning models<br />
* Contextualized and multimodal representations<br />
* Multitask learning with deep neural networks<br />
* Deep learning with scarce resources and noisy data<br />
* Transfer learning and multilinguality with deep models<br />
<br />
= Invited Presentations =<br />
<br />
The workshop will feature three keynote presentations by experts in our field,<br />
including:<br />
<br />
* [https://bplank.github.io/ Barbara Plank], IT University of Copenhagen (Denmark)<br />
<br />
= Important Dates =<br />
<br />
* Paper submission: August 16, 2019<br />
* Notification of acceptance: August 26, 2019<br />
* Final workshop papers due: September 6, 2019<br />
* Workshop: September 30, 2019<br />
<br />
= Paper Submission =<br />
<br />
All submissions are expected to have a maximum of 8 pages plus a list of references not restricted in length. Note that we do not have separate categories for long and short papers. All submissions should follow the official Nodalida 2019 format templates. The submissions are expected to be blind and follow the [https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines ACL Author Guidelines]. Parallel submission to another forum is permitted providing that the organizers are informed without delay, should the author choose to present the work at the other venue and withdraw it from this workshop. At least one author of each accepted paper must register to attend the workshop.<br />
<br />
The submission site is: https://easychair.org/conferences/?conf=dl4nlp<br />
<br />
= Programme Committee =<br />
<br />
The workshop is co-organized by members of the NLPL team,<br />
with<br />
[https://cl.lingfil.uu.se/~nivre/ Joakim Nivre] serving as the chair of the<br />
[http://wiki.nlpl.eu/index.php/Community/workshop/2019/committee Programme Committee].</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop&diff=704Community/workshop2019-07-31T08:49:52Z<p>Nivre: /* Paper Submission */</p>
<hr />
<div>The First NLPL Workshop on ''Deep Learning for Natural Language Processing''<br />
will be held on September 30, 2019 in Turku, Finland<br />
(co-located with the [https://nodalida2019.org/ NoDaLiDa Conference]).<br />
<br />
= Topic and Goals =<br />
<br />
The use of deep neural networks and related techniques has led to major improvements on many NLP tasks and has in a short time profoundly changed the research landscape in our field. Making adequate use of these techniques, however, not only requires new technical knowledge on the part of researchers but also presupposes access to large-scale computing resources often with dedicated, specialized hardware. Keeping up with the development is therefore a challenge, especially for smaller research groups in academic environments, which are common in the Nordic countries.<br />
<br />
The Nordic Language Processing Laboratory (NLPL) is a collaboration of university research groups in the Nordic countries, with support from the Nordic e-Infrastructure Collaboration (NeIC). The goal of NLPL is to create a virtual laboratory for data- and compute-intensive NLP research based on a common software, data and service stack in multiple Nordic HPC centers, to pool competencies within the user community and among expert support teams, and thereby to enable internationally competitive, data-intensive research and experimentation on a scale that would be difficult to sustain for individual research groups on commodity computing resources.<br />
<br />
The First NLPL Workshop on Deep Learning for Natural Language Processing seeks to facilitate platform building and knowledge exchange among NLP researchers in the Nordic countries. The goal of the workshop is to support NLP research based on deep learning or other techniques that require high-performance computing and to provide a forum for researchers in the Nordic countries to exchange ideas and discuss ongoing research in the area. A special focus of the workshop will be how to enable competitive research given the limitations of available resources and how to ease entry into the field for new researchers. Consequently, the workshop will not be limited solely to methodological and resource description papers, but will welcome all kinds of contributions addressing the abovementioned focus.<br />
<br />
We invite papers on all relevant topics, including but not limited to:<br />
<br />
* Using deep learning to solve NLP problems<br />
* Making deep learning more usable and accessible for NLP researchers<br />
* Improving our understanding of deep learning models in the context of NLP<br />
* Representation learning for natural language<br />
* Interpretation of neural representations<br />
* Intrinsic and extrinsic evaluation of deep learning models<br />
* Contextualized and multimodal representations<br />
* Multitask learning with deep neural networks<br />
* Deep learning with scarce resources and noisy data<br />
* Transfer learning and multilinguality with deep models<br />
<br />
= Invited Presentations =<br />
<br />
The workshop will feature three keynote presentations by experts in our field,<br />
including:<br />
<br />
* [https://bplank.github.io/ Barbara Plank], IT University of Copenhagen (Denmark)<br />
<br />
= Important Dates =<br />
<br />
* Paper submission: August 2, 2019<br />
* Notification of acceptance: August 18, 2019<br />
* Final workshop papers due: September 6, 2019<br />
* Workshop: September 30, 2019<br />
<br />
= Paper Submission =<br />
<br />
All submissions are expected to have a maximum of 8 pages plus a list of references not restricted in length. Note that we do not have separate categories for long and short papers. All submissions should follow the official Nodalida 2019 format templates. The submissions are expected to be blind and follow the [https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines ACL Author Guidelines]. Parallel submission to another forum is permitted providing that the organizers are informed without delay, should the author choose to present the work at the other venue and withdraw it from this workshop. At least one author of each accepted paper must register to attend the workshop.<br />
<br />
The submission site is: https://easychair.org/conferences/?conf=dl4nlp<br />
<br />
= Programme Committee =<br />
<br />
The workshop is co-organized by members of the NLPL team,<br />
with<br />
[https://cl.lingfil.uu.se/~nivre/ Joakim Nivre] serving as the chair of the<br />
[http://wiki.nlpl.eu/index.php/Community/workshop/2019/committee Programme Committee].</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop/2019/committee&diff=656Community/workshop/2019/committee2019-04-08T07:39:27Z<p>Nivre: /* Programme Committee */</p>
<hr />
<div><br />
This page acknowledges the members of the NLPL team who participate<br />
in the planning and organization of the <br />
[http://wiki.nlpl.eu/index.php/Community/workshop First NLPL Workshop on Deep Learning for Natural Language Processing].<br />
<br />
= Programme Committee =<br />
<br />
* Joakim Nivre, Uppsala University (chair)<br />
* Leon Derczynski, IT University Copenhagen<br />
* Filip Ginter, University of Turku<br />
* Bjørn Lindi, Nordic e-Infrastructure Collaboration (NeIC)<br />
* Tomasz Malkiewicz, Nordic e-Infrastructure Collaboration (NeIC)<br />
* Martin Matthiessen, CSC-IT Center for Science Ltd<br />
* Stephan Oepen, University of Oslo<br />
* Anders Søgaard, University of Copenhagen<br />
* Jörg Tiedemann, University of Helsinki<br />
<br />
= Organizing Committee =</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop&diff=655Community/workshop2019-04-08T07:38:31Z<p>Nivre: /* Paper Submission */</p>
<hr />
<div>The First NLPL Workshop on ''Deep Learning for Natural Language Processing''<br />
will be held on September 30, 2019 in Turku, Finland<br />
(co-located with the [https://nodalida2019.org/ NoDaLiDa Conference]).<br />
<br />
= Topic and Goals =<br />
<br />
The use of deep neural networks and related techniques has led to major improvements on many NLP tasks and has in a short time profoundly changed the research landscape in our field. Making adequate use of these techniques, however, not only requires new technical knowledge on the part of researchers but also presupposes access to large-scale computing resources often with dedicated, specialized hardware. Keeping up with the development is therefore a challenge, especially for smaller research groups in academic environments, which are common in the Nordic countries.<br />
<br />
The Nordic Language Processing Laboratory (NLPL) is a collaboration of university research groups in the Nordic countries, with support from the Nordic e-Infrastructure Collaboration (NeIC). The goal of NLPL is to create a virtual laboratory for data- and compute-intensive NLP research based on a common software, data and service stack in multiple Nordic HPC centers, to pool competencies within the user community and among expert support teams, and thereby to enable internationally competitive, data-intensive research and experimentation on a scale that would be difficult to sustain for individual research groups on commodity computing resources.<br />
<br />
The First NLPL Workshop on Deep Learning for Natural Language Processing seeks to facilitate platform building and knowledge exchange among NLP researchers in the Nordic countries. The goal of the workshop is to support NLP research based on deep learning or other techniques that require high-performance computing and to provide a forum for researchers in the Nordic countries to exchange ideas and discuss ongoing research in the area. A special focus of the workshop will be how to enable competitive research given the limitations of available resources and how to ease entry into the field for new researchers. Consequently, the workshop will not be limited solely to methodological and resource description papers, but will welcome all kinds of contributions addressing the abovementioned focus.<br />
<br />
We invite papers on all relevant topics, including but not limited to:<br />
<br />
* Using deep learning to solve NLP problems<br />
* Making deep learning more usable and accessible for NLP researchers<br />
* Improving our understanding of deep learning models in the context of NLP<br />
* Representation learning for natural language<br />
* Interpretation of neural representations<br />
* Intrinsic and extrinsic evaluation of deep learning models<br />
* Contextualized and multimodal representations<br />
* Multitask learning with deep neural networks<br />
* Deep learning with scarce resources and noisy data<br />
* Transfer learning and multilinguality with deep models<br />
<br />
= Important Dates =<br />
<br />
* Paper submission: August 2, 2019<br />
* Notification of acceptance: August 18, 2019<br />
* Final workshop papers due: September 6, 2019<br />
* Workshop: September 30, 2019<br />
<br />
= Paper Submission =<br />
<br />
All submissions are expected to have a maximum of 8 pages plus a list of references not restricted in length. Note that we do not have separate categories for long and short papers. All submissions should follow the official Nodalida 2019 format templates. Please see the [http://wiki.nlpl.eu/index.php/Community/workshop/2019/submissions detailed instructions for authors] for additional information.<br />
<br />
The submissions are expected to be blind and follow the [https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines ACL Author Guidelines]. The submission site will be announced on this site well in advance of the submission deadline. Parallel submission to another forum is permitted providing that the organizers are informed without delay, should the author choose to present the work at the other venue and withdraw it from this workshop. At least one author of each accepted paper must register to attend the workshop.<br />
<br />
= Programme Committee =<br />
<br />
The workshop is co-organized by members of the NLPL team,<br />
with<br />
[https://cl.lingfil.uu.se/~nivre/ Joakim Nivre] serving as the chair of the<br />
[http://wiki.nlpl.eu/index.php/Community/workshop/2019/committee Programme Committee].</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop&diff=654Community/workshop2019-04-08T07:35:23Z<p>Nivre: /* Important Dates */</p>
<hr />
<div>The First NLPL Workshop on ''Deep Learning for Natural Language Processing''<br />
will be held on September 30, 2019 in Turku, Finland<br />
(co-located with the [https://nodalida2019.org/ NoDaLiDa Conference]).<br />
<br />
= Topic and Goals =<br />
<br />
The use of deep neural networks and related techniques has led to major improvements on many NLP tasks and has in a short time profoundly changed the research landscape in our field. Making adequate use of these techniques, however, not only requires new technical knowledge on the part of researchers but also presupposes access to large-scale computing resources often with dedicated, specialized hardware. Keeping up with the development is therefore a challenge, especially for smaller research groups in academic environments, which are common in the Nordic countries.<br />
<br />
The Nordic Language Processing Laboratory (NLPL) is a collaboration of university research groups in the Nordic countries, with support from the Nordic e-Infrastructure Collaboration (NeIC). The goal of NLPL is to create a virtual laboratory for data- and compute-intensive NLP research based on a common software, data and service stack in multiple Nordic HPC centers, to pool competencies within the user community and among expert support teams, and thereby to enable internationally competitive, data-intensive research and experimentation on a scale that would be difficult to sustain for individual research groups on commodity computing resources.<br />
<br />
The First NLPL Workshop on Deep Learning for Natural Language Processing seeks to facilitate platform building and knowledge exchange among NLP researchers in the Nordic countries. The goal of the workshop is to support NLP research based on deep learning or other techniques that require high-performance computing and to provide a forum for researchers in the Nordic countries to exchange ideas and discuss ongoing research in the area. A special focus of the workshop will be how to enable competitive research given the limitations of available resources and how to ease entry into the field for new researchers. Consequently, the workshop will not be limited solely to methodological and resource description papers, but will welcome all kinds of contributions addressing the abovementioned focus.<br />
<br />
We invite papers on all relevant topics, including but not limited to:<br />
<br />
* Using deep learning to solve NLP problems<br />
* Making deep learning more usable and accessible for NLP researchers<br />
* Improving our understanding of deep learning models in the context of NLP<br />
* Representation learning for natural language<br />
* Interpretation of neural representations<br />
* Intrinsic and extrinsic evaluation of deep learning models<br />
* Contextualized and multimodal representations<br />
* Multitask learning with deep neural networks<br />
* Deep learning with scarce resources and noisy data<br />
* Transfer learning and multilinguality with deep models<br />
<br />
= Important Dates =<br />
<br />
* Paper submission: August 2, 2019<br />
* Notification of acceptance: August 18, 2019<br />
* Final workshop papers due: September 6, 2019<br />
* Workshop: September 30, 2019<br />
<br />
= Paper Submission =<br />
<br />
We invite papers ...<br />
Please see the<br />
[http://wiki.nlpl.eu/index.php/Community/workshop/2019/submissions detailed instructions for authors] for additional information<br />
and available template files.<br />
<br />
= Programme Committee =<br />
<br />
The workshop is co-organized by members of the NLPL team,<br />
with<br />
[https://cl.lingfil.uu.se/~nivre/ Joakim Nivre] serving as the chair of the<br />
[http://wiki.nlpl.eu/index.php/Community/workshop/2019/committee Programme Committee].</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Community/workshop&diff=653Community/workshop2019-04-08T07:34:36Z<p>Nivre: /* Topic and Goals */</p>
<hr />
<div>The First NLPL Workshop on ''Deep Learning for Natural Language Processing''<br />
will be held on September 30, 2019 in Turku, Finland<br />
(co-located with the [https://nodalida2019.org/ NoDaLiDa Conference]).<br />
<br />
= Topic and Goals =<br />
<br />
The use of deep neural networks and related techniques has led to major improvements on many NLP tasks and has in a short time profoundly changed the research landscape in our field. Making adequate use of these techniques, however, not only requires new technical knowledge on the part of researchers but also presupposes access to large-scale computing resources often with dedicated, specialized hardware. Keeping up with the development is therefore a challenge, especially for smaller research groups in academic environments, which are common in the Nordic countries.<br />
<br />
The Nordic Language Processing Laboratory (NLPL) is a collaboration of university research groups in the Nordic countries, with support from the Nordic e-Infrastructure Collaboration (NeIC). The goal of NLPL is to create a virtual laboratory for data- and compute-intensive NLP research based on a common software, data and service stack in multiple Nordic HPC centers, to pool competencies within the user community and among expert support teams, and thereby to enable internationally competitive, data-intensive research and experimentation on a scale that would be difficult to sustain for individual research groups on commodity computing resources.<br />
<br />
The First NLPL Workshop on Deep Learning for Natural Language Processing seeks to facilitate platform building and knowledge exchange among NLP researchers in the Nordic countries. The goal of the workshop is to support NLP research based on deep learning or other techniques that require high-performance computing and to provide a forum for researchers in the Nordic countries to exchange ideas and discuss ongoing research in the area. A special focus of the workshop will be how to enable competitive research given the limitations of available resources and how to ease entry into the field for new researchers. Consequently, the workshop will not be limited solely to methodological and resource description papers, but will welcome all kinds of contributions addressing the abovementioned focus.<br />
<br />
We invite papers on all relevant topics, including but not limited to:<br />
<br />
* Using deep learning to solve NLP problems<br />
* Making deep learning more usable and accessible for NLP researchers<br />
* Improving our understanding of deep learning models in the context of NLP<br />
* Representation learning for natural language<br />
* Interpretation of neural representations<br />
* Intrinsic and extrinsic evaluation of deep learning models<br />
* Contextualized and multimodal representations<br />
* Multitask learning with deep neural networks<br />
* Deep learning with scarce resources and noisy data<br />
* Transfer learning and multilinguality with deep models<br />
<br />
= Important Dates =<br />
<br />
= Paper Submission =<br />
<br />
We invite papers ...<br />
Please see the<br />
[http://wiki.nlpl.eu/index.php/Community/workshop/2019/submissions detailed instructions for authors] for additional information<br />
and available template files.<br />
<br />
= Programme Committee =<br />
<br />
The workshop is co-organized by members of the NLPL team,<br />
with<br />
[https://cl.lingfil.uu.se/~nivre/ Joakim Nivre] serving as the chair of the<br />
[http://wiki.nlpl.eu/index.php/Community/workshop/2019/committee Programme Committee].</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/home&diff=578Parsing/home2019-01-30T17:42:55Z<p>Nivre: /* Preprocessing Tools */</p>
<hr />
<div>= Background =<br />
<br />
An experimentation environment for data-driven dependency parsing is maintained for NLPL under the coordination of Uppsala University (UU).<br />
Initially, the software and data are commissioned on the Norwegian Abel supercluster.<br />
<br />
= Preprocessing Tools =<br />
<br />
* [http://wiki.nlpl.eu/index.php/Parsing/repp REPP Tokenizer (English and Norwegian)]<br />
<br />
Additionally, a variety of tools for sentence splitting, tokenization, lemmatization, et al.<br />
are available through the NLPL installations of the<br />
[http://nltk.org Natural Language Processing Toolkit (NLTK)] and the<br />
[https://spacy.io spaCy: Natural Language Processing in Python] tools.<br />
<br />
= Parsing Systems =<br />
<br />
* [http://wiki.nlpl.eu/index.php/Parsing/uuparser The Uppsala Parser]<br />
* [http://wiki.nlpl.eu/index.php/Parsing/udpipe UDPipe]<br />
* [http://wiki.nlpl.eu/index.php/Parsing/dozat Stanford Graph-Based Parser by Tim Dozat]<br />
<br />
= Training and Evaluation Data = <br />
<br />
* [http://wiki.nlpl.eu/index.php/Parsing/ud Universal Dependencies v2.0–2.3]<br />
* [http://wiki.nlpl.eu/index.php/Parsing/sdp Semantic Dependency Parsing]</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/home&diff=577Parsing/home2019-01-30T17:40:36Z<p>Nivre: /* Preprocessing Tools */</p>
<hr />
<div>= Background =<br />
<br />
An experimentation environment for data-driven dependency parsing is maintained for NLPL under the coordination of Uppsala University (UU).<br />
Initially, the software and data are commissioned on the Norwegian Abel supercluster.<br />
<br />
= Preprocessing Tools =<br />
<br />
* [http://wiki.nlpl.eu/index.php/Parsing/repp REPP Tokenizer (English and Norwegian)]<br />
<br />
Additionally, a variety of tools for sentence splitting, tokenization, lemmatization, et al.<br />
are available through the NLPL installations of the<br />
[http://nltk.org Natural Language Processing Toolkit (NLTK)] and the<br />
[https://en.wikipedia.org/wiki/SpaCy spaCy: Natural Language Processing in Python] tools.<br />
<br />
= Parsing Systems =<br />
<br />
* [http://wiki.nlpl.eu/index.php/Parsing/uuparser The Uppsala Parser]<br />
* [http://wiki.nlpl.eu/index.php/Parsing/udpipe UDPipe]<br />
* [http://wiki.nlpl.eu/index.php/Parsing/dozat Stanford Graph-Based Parser by Tim Dozat]<br />
<br />
= Training and Evaluation Data = <br />
<br />
* [http://wiki.nlpl.eu/index.php/Parsing/ud Universal Dependencies v2.0–2.3]<br />
* [http://wiki.nlpl.eu/index.php/Parsing/sdp Semantic Dependency Parsing]</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/home&diff=576Parsing/home2019-01-30T17:39:59Z<p>Nivre: /* Preprocessing Tools */</p>
<hr />
<div>= Background =<br />
<br />
An experimentation environment for data-driven dependency parsing is maintained for NLPL under the coordination of Uppsala University (UU).<br />
Initially, the software and data are commissioned on the Norwegian Abel supercluster.<br />
<br />
= Preprocessing Tools =<br />
<br />
* [http://wiki.nlpl.eu/index.php/Parsing/repp REPP Tokenizer (English and Norwegian)]<br />
<br />
Additionally, a variety of tools for sentence splitting, tokenization, lemmatization, et al.<br />
are available through the NLPL installations of the<br />
[http://nltk.org Natural Language Processing Toolkit (NLTK)]<br />
[https://en.wikipedia.org/wiki/SpaCy spaCy: Natural Language Processing in Python] tools.<br />
<br />
= Parsing Systems =<br />
<br />
* [http://wiki.nlpl.eu/index.php/Parsing/uuparser The Uppsala Parser]<br />
* [http://wiki.nlpl.eu/index.php/Parsing/udpipe UDPipe]<br />
* [http://wiki.nlpl.eu/index.php/Parsing/dozat Stanford Graph-Based Parser by Tim Dozat]<br />
<br />
= Training and Evaluation Data = <br />
<br />
* [http://wiki.nlpl.eu/index.php/Parsing/ud Universal Dependencies v2.0–2.3]<br />
* [http://wiki.nlpl.eu/index.php/Parsing/sdp Semantic Dependency Parsing]</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Infrastructure/software/catalogue&diff=524Infrastructure/software/catalogue2019-01-03T16:34:06Z<p>Nivre: /* Activity C: Data-Driven Parsing */</p>
<hr />
<div>= Background =<br />
<br />
This page provides a high-level summary of NLPL-specific software installed on either of our two systems.<br />
As a rule of thumb, NLPL aims to build on generic software installations provided by the<br />
system maintainers (e.g. development tools and libraries that are not discipline-specific),<br />
using the [http://modules.sourceforge.net/ <tt>module</tt>s infrastructure].<br />
For example, an environment like OpenNMT is unlikely to be used by other disciplines,<br />
and NLPL stands to gain from in-house, shared expertise that comes with maintaining<br />
a project-specific installation.<br />
On the other hand, the CUDA libraries are general extensions to the operating system<br />
that most users of deep learning frameworks on gpus will want to use; hence, CUDA is<br />
most appropriately installed by the core system maintainers.<br />
Frameworks like PyTorch and TensorFlow, arguably, present a middle ground to this<br />
rule of thumb:<br />
In principle, they are not discipline-specific, but in mid-2018 at least the demand for<br />
installations of these frameworks is strong within NLPL, and the project will likely<br />
benefit from growing its competencies in this area.<br />
<br />
= Module Catalogue =<br />
<br />
The discipline-specific modules maintained by NLPL are not activated by default.<br />
To make available the NLPL directory of module configurations, on top of the<br />
pre-configured, system-wide modules, one needs to:<br />
<br />
<pre><br />
module use -a /proj*/nlpl/software/modulefiles/<br />
</pre><br />
<br />
We will at times assume a shell variable <tt>$NLPLROOT</tt> that points to the<br />
top-level project directory, i.e. <tt>/projects/nlpl/</tt> (on Abel) or<br />
<tt>/proj/nlpl/</tt> (on Taito).<br />
For NLPL users, we recommend that one adds the above <tt>module use</tt> command<br />
to the shell start-up script, e.g. <tt>.bashrc</tt> in the user home directory.<br />
<br />
= Activity A: Basic Infrastructure =<br />
<br />
Interoperability of NLPL installations with each other, as well as with system-wide<br />
software that is maintained by the core operations teams for Abel and Taito, is no<br />
small challenge; neither is parallelism across the two systems, for example in<br />
available software (and versions) and techniques for ‘mixing and matching’.<br />
These challenges are discussed in some more detail with regard to the<br />
[http://wiki.nlpl.eu/index.php/Infrastructure/software/python Python programming environment]<br />
and with regard to<br />
[http://wiki.nlpl.eu/index.php/Infrastructure/software/frameworks common Deep Learning frameworks].<br />
<br />
{| class="wikitable"<br />
|-<br />
! Module Name/Version !! Description !! System !! Install Date !! Maintainer<br />
|-<br />
| nlpl-cython/0.29.1 || C Extensions for Python || Abel || December 2018 || Stephan Oepen<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Infrastructure/software/nltk nlpl-nltk/3.3] || Natural Language Toolkit (NLTK) || Abel, Taito || September 2018 || Stephan Oepen<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Infrastructure/software/pytorch nlpl-pytorch/0.4.1] || PyTorch Deep Learning Framework (CPU and GPU) || Abel, Taito || September 2018 || Stephan Oepen<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Infrastructure/software/spacy nlpl-spacy/2.0.12] || spaCy: Natural Language Processing in Python || Abel, Taito || October 2018 || Stephan Oepen<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Infrastructure/software/tensorflow nlpl-tensorflow/1.11] || TensorFlow Deep Learning Framework (CPU and GPU) || Abel, Taito || September 2018 || Stephan Oepen<br />
|}<br />
<br />
= Activity B: Statistical and Neural Machine Translation =<br />
<br />
{| class="wikitable"<br />
|-<br />
! Module Name/Version !! Description !! System !! Install Date !! Maintainer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Moses_module nlpl-moses/mmt-mvp-v0.12.1-2739-gdc42bcb] || Moses SMT system, including GIZA++, MGIZA, fast_align || Taito || July 2017 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Moses_module nlpl-moses/4.0-65c75ff] || Moses SMT System Release 4.0, including GIZA++, MGIZA, fast_align, SALM<br/>Some minor fixes added to existing install 2/2018.<br/> Should not break compatibility except when using tokenizer.perl for Finnish or Swedish. || Taito, Abel || November 2017 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Efmaral_module nlpl-efmaral/0.1_2017_07_20] || efmaral and eflomal word alignment tools || Taito || July 2017 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Efmaral_module nlpl-efmaral/0.1_2017_11_24] || efmaral and eflomal word alignment tools || Taito, Abel || November 2017 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Efmaral_module nlpl-efmaral/0.1_2018_12_13/17] || efmaral and eflomal word alignment tools || Taito, Abel || December 2018 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_HNMT_module nlpl-hnmt/1.0.1] || HNMT neural machine translation system || Taito || March 2018 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/opennmt-py nlpl-opennmt-py/0.2.1] || OpenNMT Python Library || Abel, Taito || September 2018 || Stephan Oepen<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Marian_module nlpl-marian/1.2.0] || Marian neural machine translation system || Taito || March 2018 || Yves Scherrer<br />
|-<br />
| marian/1.5 || Marian neural machine translation system || Taito || June 2018 || CSC staff<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_mttools_module nlpl-mttools/2018_12_23] || A collection of preprocessing and evaluation script for machine translation || Taito, Abel || December 2018 || Yves Scherrer<br />
|}<br />
<br />
= Activity C: Data-Driven Parsing =<br />
<br />
{| class="wikitable"<br />
|-<br />
! Module Name/Version !! Description !! System !! Install Date !! Maintainer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Parsing/uuparser nlpl-uuparser] || Uppsala Parser || Abel || December 2018 || <br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Parsing/udpipe nlpl-udpipe] || UDPipe 1.2 with Pre-Trained Models || Taito, Abel || November 2017 ||<br />
|}<br />
<br />
= Activity E: Pre-Trained Word Embeddings =<br />
<br />
{| class="wikitable"<br />
|-<br />
! Module Name/Version !! Description !! System !! Install Date !! Maintainer<br />
|-<br />
| nlpl-gensim/3.6.0 || GenSim: Topic Modeling for Humans || Taito, Abel || October 2018 || Stephan Oepen<br />
|}<br />
<br />
= Activity G: OPUS Parallel Corpus =<br />
<br />
{| class="wikitable"<br />
|-<br />
! Module Name/Version !! Description !! System !! Install Date !! Maintainer<br />
|-<br />
| nlpl-cwb/3.4.12 || Corpus Work Bench (CWB) || Taito, Abel || November 2017 ||<br />
|-<br />
| nlpl-opus/0.1 || Various OPUS Tools || Taito, Abel || November 2017 ||<br />
|-<br />
| nlpl-uplug/0.3.8dev || UPlug Parallel Corpus Tools || Taito, Abel || November 2017 ||<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Infrastructure/software/catalogue&diff=523Infrastructure/software/catalogue2019-01-03T16:31:05Z<p>Nivre: /* Activity C: Data-Driven Parsing */</p>
<hr />
<div>= Background =<br />
<br />
This page provides a high-level summary of NLPL-specific software installed on either of our two systems.<br />
As a rule of thumb, NLPL aims to build on generic software installations provided by the<br />
system maintainers (e.g. development tools and libraries that are not discipline-specific),<br />
using the [http://modules.sourceforge.net/ <tt>module</tt>s infrastructure].<br />
For example, an environment like OpenNMT is unlikely to be used by other disciplines,<br />
and NLPL stands to gain from in-house, shared expertise that comes with maintaining<br />
a project-specific installation.<br />
On the other hand, the CUDA libraries are general extensions to the operating system<br />
that most users of deep learning frameworks on gpus will want to use; hence, CUDA is<br />
most appropriately installed by the core system maintainers.<br />
Frameworks like PyTorch and TensorFlow, arguably, present a middle ground to this<br />
rule of thumb:<br />
In principle, they are not discipline-specific, but in mid-2018 at least the demand for<br />
installations of these frameworks is strong within NLPL, and the project will likely<br />
benefit from growing its competencies in this area.<br />
<br />
= Module Catalogue =<br />
<br />
The discipline-specific modules maintained by NLPL are not activated by default.<br />
To make available the NLPL directory of module configurations, on top of the<br />
pre-configured, system-wide modules, one needs to:<br />
<br />
<pre><br />
module use -a /proj*/nlpl/software/modulefiles/<br />
</pre><br />
<br />
We will at times assume a shell variable <tt>$NLPLROOT</tt> that points to the<br />
top-level project directory, i.e. <tt>/projects/nlpl/</tt> (on Abel) or<br />
<tt>/proj/nlpl/</tt> (on Taito).<br />
For NLPL users, we recommend that one adds the above <tt>module use</tt> command<br />
to the shell start-up script, e.g. <tt>.bashrc</tt> in the user home directory.<br />
<br />
= Activity A: Basic Infrastructure =<br />
<br />
Interoperability of NLPL installations with each other, as well as with system-wide<br />
software that is maintained by the core operations teams for Abel and Taito, is no<br />
small challenge; neither is parallelism across the two systems, for example in<br />
available software (and versions) and techniques for ‘mixing and matching’.<br />
These challenges are discussed in some more detail with regard to the<br />
[http://wiki.nlpl.eu/index.php/Infrastructure/software/python Python programming environment]<br />
and with regard to<br />
[http://wiki.nlpl.eu/index.php/Infrastructure/software/frameworks common Deep Learning frameworks].<br />
<br />
{| class="wikitable"<br />
|-<br />
! Module Name/Version !! Description !! System !! Install Date !! Maintainer<br />
|-<br />
| nlpl-cython/0.29.1 || C Extensions for Python || Abel || December 2018 || Stephan Oepen<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Infrastructure/software/nltk nlpl-nltk/3.3] || Natural Language Toolkit (NLTK) || Abel, Taito || September 2018 || Stephan Oepen<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Infrastructure/software/pytorch nlpl-pytorch/0.4.1] || PyTorch Deep Learning Framework (CPU and GPU) || Abel, Taito || September 2018 || Stephan Oepen<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Infrastructure/software/spacy nlpl-spacy/2.0.12] || spaCy: Natural Language Processing in Python || Abel, Taito || October 2018 || Stephan Oepen<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Infrastructure/software/tensorflow nlpl-tensorflow/1.11] || TensorFlow Deep Learning Framework (CPU and GPU) || Abel, Taito || September 2018 || Stephan Oepen<br />
|}<br />
<br />
= Activity B: Statistical and Neural Machine Translation =<br />
<br />
{| class="wikitable"<br />
|-<br />
! Module Name/Version !! Description !! System !! Install Date !! Maintainer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Moses_module nlpl-moses/mmt-mvp-v0.12.1-2739-gdc42bcb] || Moses SMT system, including GIZA++, MGIZA, fast_align || Taito || July 2017 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Moses_module nlpl-moses/4.0-65c75ff] || Moses SMT System Release 4.0, including GIZA++, MGIZA, fast_align, SALM<br/>Some minor fixes added to existing install 2/2018.<br/> Should not break compatibility except when using tokenizer.perl for Finnish or Swedish. || Taito, Abel || November 2017 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Efmaral_module nlpl-efmaral/0.1_2017_07_20] || efmaral and eflomal word alignment tools || Taito || July 2017 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Efmaral_module nlpl-efmaral/0.1_2017_11_24] || efmaral and eflomal word alignment tools || Taito, Abel || November 2017 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Efmaral_module nlpl-efmaral/0.1_2018_12_13/17] || efmaral and eflomal word alignment tools || Taito, Abel || December 2018 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_HNMT_module nlpl-hnmt/1.0.1] || HNMT neural machine translation system || Taito || March 2018 || Yves Scherrer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/opennmt-py nlpl-opennmt-py/0.2.1] || OpenNMT Python Library || Abel, Taito || September 2018 || Stephan Oepen<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_Marian_module nlpl-marian/1.2.0] || Marian neural machine translation system || Taito || March 2018 || Yves Scherrer<br />
|-<br />
| marian/1.5 || Marian neural machine translation system || Taito || June 2018 || CSC staff<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Translation/home#Using_the_mttools_module nlpl-mttools/2018_12_23] || A collection of preprocessing and evaluation script for machine translation || Taito, Abel || December 2018 || Yves Scherrer<br />
|}<br />
<br />
= Activity C: Data-Driven Parsing =<br />
<br />
{| class="wikitable"<br />
|-<br />
! Module Name/Version !! Description !! System !! Install Date !! Maintainer<br />
|-<br />
| [http://wiki.nlpl.eu/index.php/Parsing/uuparser] || Uppsala Parser || Abel || December 2018 || <br />
|-<br />
| http://wiki.nlpl.eu/index.php/Parsing/udpipe || UDPipe 1.2 with Pre-Trained Models || Taito, Abel || November 2017 ||<br />
|}<br />
<br />
= Activity E: Pre-Trained Word Embeddings =<br />
<br />
{| class="wikitable"<br />
|-<br />
! Module Name/Version !! Description !! System !! Install Date !! Maintainer<br />
|-<br />
| nlpl-gensim/3.6.0 || GenSim: Topic Modeling for Humans || Taito, Abel || October 2018 || Stephan Oepen<br />
|}<br />
<br />
= Activity G: OPUS Parallel Corpus =<br />
<br />
{| class="wikitable"<br />
|-<br />
! Module Name/Version !! Description !! System !! Install Date !! Maintainer<br />
|-<br />
| nlpl-cwb/3.4.12 || Corpus Work Bench (CWB) || Taito, Abel || November 2017 ||<br />
|-<br />
| nlpl-opus/0.1 || Various OPUS Tools || Taito, Abel || November 2017 ||<br />
|-<br />
| nlpl-uplug/0.3.8dev || UPlug Parallel Corpus Tools || Taito, Abel || November 2017 ||<br />
|}</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/uuparser&diff=490Parsing/uuparser2018-12-20T12:25:37Z<p>Nivre: /* Training a multilingual parsing model */</p>
<hr />
<div>= The Uppsala Parser =<br />
<br />
The Uppsala Parser is a neural transition-based dependency parser based on bist-parser by Eli Kiperwasser and Yoav Goldberg and developed primarily in the context of the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. The Uppsala Parser is publicly available at https://github.com/UppsalaNLP/uuparser. <br />
Note that the version installed here may exhibit some slight differences, designed to improve ease of use.<br />
<br />
== Using the Uppsala Parser on Abel ==<br />
<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-uuparser<br />
<br />
== Training a parsing model ==<br />
<br />
To train a set of parsing models on treebanks from Universal Dependencies (v2.2 or later):<br />
<br />
uuparser --include [languages to include denoted by their treebank id] --outdir my-output-directory<br />
<br />
For example:<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments<br />
<br />
will train separate models on UD Swedish-Talbanken, UD English-ParTUT and UD Russian-SynTagRus and store the results in the <code>experiments</code> folder in your home directory. Model selection is included in the training process by default; that is, at each epoch the current model is evaluated on the UD dev data, and at the end of training the best performing model for each language is selected.<br />
<br />
== Predicting with a pre-trained parsing model ==<br />
<br />
To predict on UD test data with the models trained above:<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments --predict<br />
<br />
== Options ==<br />
<br />
The parser has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
uuparser --help | less<br />
<br />
We recommend you set the <code>--dynet-mem</code> option to a larger number when running the full training procedure on larger treebanks. Commonly used values are 5000 and 10000 (in MB). Dynet is the neural network library on which the parser is built.<br />
<br />
Note that due to random initialization and other non-deterministic elements in the training process, you will not obtain the same results even when training twice under exactly the same circumstances (e.g. languages, number of epochs etc.). To ensure identical results between two runs, we recommend setting the <code>--dynet-seed</code> option to the same value both times (e.g. <code> --dynet-seed 123456789</code>) and adding the <code>--use-default-seed</code> flag. This ensures that Python's random number generator and Dynet both produce the same sequence of random numbers.<br />
<br />
== Training a multilingual parsing model ==<br />
<br />
Our parser supports multilingual parsing, that is a single parsing model for one or more languages. To train a ''multilingual'' model for the three languages in the examples above, we simply add the <code>--multiling</code> flag when training<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments --multiling --dynet-mem 5000<br />
<br />
In this case, instead of creating three separate models in the language-specific subdirectories within ~/experiments, a single model will be created directly in this folder. Predicting on test data is then as easy as:<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments --multiling --dynet-mem 5000 --predict<br />
<br />
Note that if you want to have different output directories for training and predicting, the <code>--modeldir</code> option can be specified when predicting to tell the parser where the pre-trained model can be found.<br />
<br />
== Segmentation ==<br />
<br />
In the above examples, we assume pre-segmented input data already in the [http://universaldependencies.org/format.html CONLL-U] format. If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using <code>module load nlpl-udpipe</code> and then run by typing <code>udpipe</code> at the command line, see below.</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/uuparser&diff=489Parsing/uuparser2018-12-20T12:23:01Z<p>Nivre: /* Predicting with a pre-trained parsing model */</p>
<hr />
<div>= The Uppsala Parser =<br />
<br />
The Uppsala Parser is a neural transition-based dependency parser based on bist-parser by Eli Kiperwasser and Yoav Goldberg and developed primarily in the context of the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. The Uppsala Parser is publicly available at https://github.com/UppsalaNLP/uuparser. <br />
Note that the version installed here may exhibit some slight differences, designed to improve ease of use.<br />
<br />
== Using the Uppsala Parser on Abel ==<br />
<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-uuparser<br />
<br />
== Training a parsing model ==<br />
<br />
To train a set of parsing models on treebanks from Universal Dependencies (v2.2 or later):<br />
<br />
uuparser --include [languages to include denoted by their treebank id] --outdir my-output-directory<br />
<br />
For example:<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments<br />
<br />
will train separate models on UD Swedish-Talbanken, UD English-ParTUT and UD Russian-SynTagRus and store the results in the <code>experiments</code> folder in your home directory. Model selection is included in the training process by default; that is, at each epoch the current model is evaluated on the UD dev data, and at the end of training the best performing model for each language is selected.<br />
<br />
== Predicting with a pre-trained parsing model ==<br />
<br />
To predict on UD test data with the models trained above:<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments --predict<br />
<br />
== Options ==<br />
<br />
The parser has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
uuparser --help | less<br />
<br />
We recommend you set the <code>--dynet-mem</code> option to a larger number when running the full training procedure on larger treebanks. Commonly used values are 5000 and 10000 (in MB). Dynet is the neural network library on which the parser is built.<br />
<br />
Note that due to random initialization and other non-deterministic elements in the training process, you will not obtain the same results even when training twice under exactly the same circumstances (e.g. languages, number of epochs etc.). To ensure identical results between two runs, we recommend setting the <code>--dynet-seed</code> option to the same value both times (e.g. <code> --dynet-seed 123456789</code>) and adding the <code>--use-default-seed</code> flag. This ensures that Python's random number generator and Dynet both produce the same sequence of random numbers.<br />
<br />
== Training a multilingual parsing model ==<br />
<br />
Our parser supports multilingual parsing, that is a single parsing model for one or more languages. To train a ''multilingual'' model for the three languages in the examples above, we simply add the <code>--multilingual</code> flag when training<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000<br />
<br />
In this case, instead of creating three separate models in the language-specific subdirectories within ~/experiments, a single model will be created directly in this folder. Predicting on test data is then as easy as:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000 --predict<br />
<br />
Note that if you want to have different output directories for training and predicting, the <code>--modeldir</code> option can be specified when predicting to tell the parser where the pre-trained model can be found.<br />
<br />
== Segmentation ==<br />
<br />
In the above examples, we assume pre-segmented input data already in the [http://universaldependencies.org/format.html CONLL-U] format. If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using <code>module load nlpl-udpipe</code> and then run by typing <code>udpipe</code> at the command line, see below.</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/uuparser&diff=488Parsing/uuparser2018-12-20T12:22:46Z<p>Nivre: /* Training a parsing model */</p>
<hr />
<div>= The Uppsala Parser =<br />
<br />
The Uppsala Parser is a neural transition-based dependency parser based on bist-parser by Eli Kiperwasser and Yoav Goldberg and developed primarily in the context of the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. The Uppsala Parser is publicly available at https://github.com/UppsalaNLP/uuparser. <br />
Note that the version installed here may exhibit some slight differences, designed to improve ease of use.<br />
<br />
== Using the Uppsala Parser on Abel ==<br />
<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-uuparser<br />
<br />
== Training a parsing model ==<br />
<br />
To train a set of parsing models on treebanks from Universal Dependencies (v2.2 or later):<br />
<br />
uuparser --include [languages to include denoted by their treebank id] --outdir my-output-directory<br />
<br />
For example:<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments<br />
<br />
will train separate models on UD Swedish-Talbanken, UD English-ParTUT and UD Russian-SynTagRus and store the results in the <code>experiments</code> folder in your home directory. Model selection is included in the training process by default; that is, at each epoch the current model is evaluated on the UD dev data, and at the end of training the best performing model for each language is selected.<br />
<br />
== Predicting with a pre-trained parsing model ==<br />
<br />
To predict on UD test data with the models trained above:<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments --predict<br />
<br />
If you trained a single multi-treebank model, the flag --multiling should be added at prediction time too.<br />
<br />
== Options ==<br />
<br />
The parser has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
uuparser --help | less<br />
<br />
We recommend you set the <code>--dynet-mem</code> option to a larger number when running the full training procedure on larger treebanks. Commonly used values are 5000 and 10000 (in MB). Dynet is the neural network library on which the parser is built.<br />
<br />
Note that due to random initialization and other non-deterministic elements in the training process, you will not obtain the same results even when training twice under exactly the same circumstances (e.g. languages, number of epochs etc.). To ensure identical results between two runs, we recommend setting the <code>--dynet-seed</code> option to the same value both times (e.g. <code> --dynet-seed 123456789</code>) and adding the <code>--use-default-seed</code> flag. This ensures that Python's random number generator and Dynet both produce the same sequence of random numbers.<br />
<br />
== Training a multilingual parsing model ==<br />
<br />
Our parser supports multilingual parsing, that is a single parsing model for one or more languages. To train a ''multilingual'' model for the three languages in the examples above, we simply add the <code>--multilingual</code> flag when training<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000<br />
<br />
In this case, instead of creating three separate models in the language-specific subdirectories within ~/experiments, a single model will be created directly in this folder. Predicting on test data is then as easy as:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000 --predict<br />
<br />
Note that if you want to have different output directories for training and predicting, the <code>--modeldir</code> option can be specified when predicting to tell the parser where the pre-trained model can be found.<br />
<br />
== Segmentation ==<br />
<br />
In the above examples, we assume pre-segmented input data already in the [http://universaldependencies.org/format.html CONLL-U] format. If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using <code>module load nlpl-udpipe</code> and then run by typing <code>udpipe</code> at the command line, see below.</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/uuparser&diff=487Parsing/uuparser2018-12-20T12:22:17Z<p>Nivre: /* Predicting with a pre-trained parsing model */</p>
<hr />
<div>= The Uppsala Parser =<br />
<br />
The Uppsala Parser is a neural transition-based dependency parser based on bist-parser by Eli Kiperwasser and Yoav Goldberg and developed primarily in the context of the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. The Uppsala Parser is publicly available at https://github.com/UppsalaNLP/uuparser. <br />
Note that the version installed here may exhibit some slight differences, designed to improve ease of use.<br />
<br />
== Using the Uppsala Parser on Abel ==<br />
<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-uuparser<br />
<br />
== Training a parsing model ==<br />
<br />
To train a set of parsing models on treebanks from Universal Dependencies (v2.2 or later):<br />
<br />
uuparser --include [languages to include denoted by their treebank id] --outdir my-output-directory<br />
<br />
For example:<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments<br />
<br />
will train separate models on UD Swedish-Talbanken, UD English-ParTUT and UD Russian-SynTagRus and store the results in the <code>experiments</code> folder in your home directory. To train a single multi-treebank model on all training sets, simply add the --multiling flag.<br />
<br />
Model selection is included in the training process by default; that is, at each epoch the current model is evaluated on the UD dev data, and at the end of training the best performing model for each language is selected.<br />
<br />
== Predicting with a pre-trained parsing model ==<br />
<br />
To predict on UD test data with the models trained above:<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments --predict<br />
<br />
If you trained a single multi-treebank model, the flag --multiling should be added at prediction time too.<br />
<br />
== Options ==<br />
<br />
The parser has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
uuparser --help | less<br />
<br />
We recommend you set the <code>--dynet-mem</code> option to a larger number when running the full training procedure on larger treebanks. Commonly used values are 5000 and 10000 (in MB). Dynet is the neural network library on which the parser is built.<br />
<br />
Note that due to random initialization and other non-deterministic elements in the training process, you will not obtain the same results even when training twice under exactly the same circumstances (e.g. languages, number of epochs etc.). To ensure identical results between two runs, we recommend setting the <code>--dynet-seed</code> option to the same value both times (e.g. <code> --dynet-seed 123456789</code>) and adding the <code>--use-default-seed</code> flag. This ensures that Python's random number generator and Dynet both produce the same sequence of random numbers.<br />
<br />
== Training a multilingual parsing model ==<br />
<br />
Our parser supports multilingual parsing, that is a single parsing model for one or more languages. To train a ''multilingual'' model for the three languages in the examples above, we simply add the <code>--multilingual</code> flag when training<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000<br />
<br />
In this case, instead of creating three separate models in the language-specific subdirectories within ~/experiments, a single model will be created directly in this folder. Predicting on test data is then as easy as:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000 --predict<br />
<br />
Note that if you want to have different output directories for training and predicting, the <code>--modeldir</code> option can be specified when predicting to tell the parser where the pre-trained model can be found.<br />
<br />
== Segmentation ==<br />
<br />
In the above examples, we assume pre-segmented input data already in the [http://universaldependencies.org/format.html CONLL-U] format. If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using <code>module load nlpl-udpipe</code> and then run by typing <code>udpipe</code> at the command line, see below.</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/uuparser&diff=486Parsing/uuparser2018-12-20T12:21:29Z<p>Nivre: /* Training a parsing model */</p>
<hr />
<div>= The Uppsala Parser =<br />
<br />
The Uppsala Parser is a neural transition-based dependency parser based on bist-parser by Eli Kiperwasser and Yoav Goldberg and developed primarily in the context of the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. The Uppsala Parser is publicly available at https://github.com/UppsalaNLP/uuparser. <br />
Note that the version installed here may exhibit some slight differences, designed to improve ease of use.<br />
<br />
== Using the Uppsala Parser on Abel ==<br />
<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-uuparser<br />
<br />
== Training a parsing model ==<br />
<br />
To train a set of parsing models on treebanks from Universal Dependencies (v2.2 or later):<br />
<br />
uuparser --include [languages to include denoted by their treebank id] --outdir my-output-directory<br />
<br />
For example:<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments<br />
<br />
will train separate models on UD Swedish-Talbanken, UD English-ParTUT and UD Russian-SynTagRus and store the results in the <code>experiments</code> folder in your home directory. To train a single multi-treebank model on all training sets, simply add the --multiling flag.<br />
<br />
Model selection is included in the training process by default; that is, at each epoch the current model is evaluated on the UD dev data, and at the end of training the best performing model for each language is selected.<br />
<br />
== Predicting with a pre-trained parsing model ==<br />
<br />
To predict on UD test data with the models trained above:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --predict<br />
<br />
You may again include the <code>--mini</code> flag if you prefer to test on a subset of 50 test sentences.<br />
<br />
== Options ==<br />
<br />
The parser has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
uuparser --help | less<br />
<br />
We recommend you set the <code>--dynet-mem</code> option to a larger number when running the full training procedure on larger treebanks. Commonly used values are 5000 and 10000 (in MB). Dynet is the neural network library on which the parser is built.<br />
<br />
Note that due to random initialization and other non-deterministic elements in the training process, you will not obtain the same results even when training twice under exactly the same circumstances (e.g. languages, number of epochs etc.). To ensure identical results between two runs, we recommend setting the <code>--dynet-seed</code> option to the same value both times (e.g. <code> --dynet-seed 123456789</code>) and adding the <code>--use-default-seed</code> flag. This ensures that Python's random number generator and Dynet both produce the same sequence of random numbers.<br />
<br />
== Training a multilingual parsing model ==<br />
<br />
Our parser supports multilingual parsing, that is a single parsing model for one or more languages. To train a ''multilingual'' model for the three languages in the examples above, we simply add the <code>--multilingual</code> flag when training<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000<br />
<br />
In this case, instead of creating three separate models in the language-specific subdirectories within ~/experiments, a single model will be created directly in this folder. Predicting on test data is then as easy as:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000 --predict<br />
<br />
Note that if you want to have different output directories for training and predicting, the <code>--modeldir</code> option can be specified when predicting to tell the parser where the pre-trained model can be found.<br />
<br />
== Segmentation ==<br />
<br />
In the above examples, we assume pre-segmented input data already in the [http://universaldependencies.org/format.html CONLL-U] format. If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using <code>module load nlpl-udpipe</code> and then run by typing <code>udpipe</code> at the command line, see below.</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/uuparser&diff=485Parsing/uuparser2018-12-20T12:20:41Z<p>Nivre: /* Training a parsing model */</p>
<hr />
<div>= The Uppsala Parser =<br />
<br />
The Uppsala Parser is a neural transition-based dependency parser based on bist-parser by Eli Kiperwasser and Yoav Goldberg and developed primarily in the context of the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. The Uppsala Parser is publicly available at https://github.com/UppsalaNLP/uuparser. <br />
Note that the version installed here may exhibit some slight differences, designed to improve ease of use.<br />
<br />
== Using the Uppsala Parser on Abel ==<br />
<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-uuparser<br />
<br />
== Training a parsing model ==<br />
<br />
To train a set of parsing models on treebanks from Universal Dependencies (v2.2 or later):<br />
<br />
uuparser --include [languages to include denoted by their treebank id] --outdir my-output-directory<br />
<br />
For example:<br />
<br />
uuparser --include "sv_talbanken en_partut ru_syntagrus" --outdir ~/experiments<br />
<br />
will train separate models on UD Swedish-Talbanken, UD English-ParTUT and UD Russian-SynTagRus and store the results in the <code>experiments</code> folder in your home directory. To train a multi-treebank model, simply add the --multiling flag at both training and test time.<br />
<br />
Model selection is included in the training process by default; that is, at each epoch the current model is evaluated on the UD dev data, and at the end of training the best performing model for each language is selected.<br />
<br />
== Predicting with a pre-trained parsing model ==<br />
<br />
To predict on UD test data with the models trained above:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --predict<br />
<br />
You may again include the <code>--mini</code> flag if you prefer to test on a subset of 50 test sentences.<br />
<br />
== Options ==<br />
<br />
The parser has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
uuparser --help | less<br />
<br />
We recommend you set the <code>--dynet-mem</code> option to a larger number when running the full training procedure on larger treebanks. Commonly used values are 5000 and 10000 (in MB). Dynet is the neural network library on which the parser is built.<br />
<br />
Note that due to random initialization and other non-deterministic elements in the training process, you will not obtain the same results even when training twice under exactly the same circumstances (e.g. languages, number of epochs etc.). To ensure identical results between two runs, we recommend setting the <code>--dynet-seed</code> option to the same value both times (e.g. <code> --dynet-seed 123456789</code>) and adding the <code>--use-default-seed</code> flag. This ensures that Python's random number generator and Dynet both produce the same sequence of random numbers.<br />
<br />
== Training a multilingual parsing model ==<br />
<br />
Our parser supports multilingual parsing, that is a single parsing model for one or more languages. To train a ''multilingual'' model for the three languages in the examples above, we simply add the <code>--multilingual</code> flag when training<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000<br />
<br />
In this case, instead of creating three separate models in the language-specific subdirectories within ~/experiments, a single model will be created directly in this folder. Predicting on test data is then as easy as:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000 --predict<br />
<br />
Note that if you want to have different output directories for training and predicting, the <code>--modeldir</code> option can be specified when predicting to tell the parser where the pre-trained model can be found.<br />
<br />
== Segmentation ==<br />
<br />
In the above examples, we assume pre-segmented input data already in the [http://universaldependencies.org/format.html CONLL-U] format. If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using <code>module load nlpl-udpipe</code> and then run by typing <code>udpipe</code> at the command line, see below.</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/uuparser&diff=484Parsing/uuparser2018-12-20T12:20:26Z<p>Nivre: /* Training a parsing model */</p>
<hr />
<div>= The Uppsala Parser =<br />
<br />
The Uppsala Parser is a neural transition-based dependency parser based on bist-parser by Eli Kiperwasser and Yoav Goldberg and developed primarily in the context of the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. The Uppsala Parser is publicly available at https://github.com/UppsalaNLP/uuparser. <br />
Note that the version installed here may exhibit some slight differences, designed to improve ease of use.<br />
<br />
== Using the Uppsala Parser on Abel ==<br />
<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-uuparser<br />
<br />
== Training a parsing model ==<br />
<br />
To train a set of parsing models on treebanks from Universal Dependencies (v2.2 or later):<br />
<br />
uuparser --include [languages to include denoted by their treebank id] --outdir my-output-directory<br />
<br />
For example:<br />
<br />
uuparser --include ""sv_talbanken en_partut ru_syntagrus"" --outdir ~/experiments<br />
<br />
will train separate models on UD Swedish-Talbanken, UD English-ParTUT and UD Russian-SynTagRus and store the results in the <code>experiments</code> folder in your home directory. To train a multi-treebank model, simply add the --multiling flag at both training and test time.<br />
<br />
Model selection is included in the training process by default; that is, at each epoch the current model is evaluated on the UD dev data, and at the end of training the best performing model for each language is selected.<br />
<br />
== Predicting with a pre-trained parsing model ==<br />
<br />
To predict on UD test data with the models trained above:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --predict<br />
<br />
You may again include the <code>--mini</code> flag if you prefer to test on a subset of 50 test sentences.<br />
<br />
== Options ==<br />
<br />
The parser has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
uuparser --help | less<br />
<br />
We recommend you set the <code>--dynet-mem</code> option to a larger number when running the full training procedure on larger treebanks. Commonly used values are 5000 and 10000 (in MB). Dynet is the neural network library on which the parser is built.<br />
<br />
Note that due to random initialization and other non-deterministic elements in the training process, you will not obtain the same results even when training twice under exactly the same circumstances (e.g. languages, number of epochs etc.). To ensure identical results between two runs, we recommend setting the <code>--dynet-seed</code> option to the same value both times (e.g. <code> --dynet-seed 123456789</code>) and adding the <code>--use-default-seed</code> flag. This ensures that Python's random number generator and Dynet both produce the same sequence of random numbers.<br />
<br />
== Training a multilingual parsing model ==<br />
<br />
Our parser supports multilingual parsing, that is a single parsing model for one or more languages. To train a ''multilingual'' model for the three languages in the examples above, we simply add the <code>--multilingual</code> flag when training<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000<br />
<br />
In this case, instead of creating three separate models in the language-specific subdirectories within ~/experiments, a single model will be created directly in this folder. Predicting on test data is then as easy as:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000 --predict<br />
<br />
Note that if you want to have different output directories for training and predicting, the <code>--modeldir</code> option can be specified when predicting to tell the parser where the pre-trained model can be found.<br />
<br />
== Segmentation ==<br />
<br />
In the above examples, we assume pre-segmented input data already in the [http://universaldependencies.org/format.html CONLL-U] format. If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using <code>module load nlpl-udpipe</code> and then run by typing <code>udpipe</code> at the command line, see below.</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/udpipe&diff=482Parsing/udpipe2018-12-18T10:25:06Z<p>Nivre: /* Using UDPipe on Abel */</p>
<hr />
<div>= UDPipe =<br />
<br />
UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared tasks on universal dependency parsing in 2017 and 2018.<br />
<br />
== Using UDPipe on Abel ==<br />
<br />
UDPipe is available as a module on Abel. It was installed as part of the OPUS activity.<br />
<br />
How to use UDPipe:<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the udpipe module:<br />
module load nlpl-udpipe<br />
<br />
To learn more about using UDPipe, check the official [UDPipe User's Manual](http://ufal.mff.cuni.cz/udpipe/users-manual)</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/home&diff=454Parsing/home2018-11-28T15:59:50Z<p>Nivre: </p>
<hr />
<div>= Background =<br />
<br />
An experimentation environment for data-driven dependency parsing is maintained for NLPL under the coordination of Uppsala University (UU).<br />
Initially, the software and data are commissioned on the Norwegian Abel supercluster.<br />
<br />
= Available Parsers =<br />
<br />
* [http://wiki.nlpl.eu/index.php/Parsing/uuparser The Uppsala Parser]<br />
* [http://wiki.nlpl.eu/index.php/Parsing/udpipe UDPipe]<br />
<br />
= Available Data Sets = <br />
<br />
* [http://wiki.nlpl.eu/index.php/Parsing/universal_dependencies Universal Dependencies v2.0-2.3]</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/ud&diff=453Parsing/ud2018-11-28T15:59:16Z<p>Nivre: Created page with "= Universal Dependencies = For syntactic parsing experiments we provide data from the [http://universaldependencies.org/ Universal Dependencies (UD) project] for a high numbe..."</p>
<hr />
<div>= Universal Dependencies =<br />
<br />
For syntactic parsing experiments we provide data from the [http://universaldependencies.org/ Universal Dependencies (UD) project] for a high number of languages. The data is provided in v2.0, which was used for the CoNLL shared task 2017, v2.1, v2.2, which was used for the CoNLL shared task 2018, and v2.3. <br />
<br />
All data is available on Abel at <code>/projects/nlpl/data/parsing/universal_dependencies</code><br />
<br />
== UD version 2.0 ==<br />
<br />
folders:<br><br />
<code>/projects/nlpl/data/parsing/universal_dependencies/ud-treebanks-v2.0-conll2017</code><br><br />
<code>/projects/nlpl/data/parsing/universal_dependencies/ud-test-v2.0-conll2017</code><br />
<br />
info:<br><br />
Version 2.0 treebanks, archived at http://hdl.handle.net/11234/1-1983. <br><br />
70 treebanks, 50 languages, released March 1, 2017.<br><br />
Test data 2.0 are archived at http://hdl.handle.net/11234/1-2184. <br><br />
81 treebanks, 49 languages, released May 18, 2017.<br />
<br />
Release 2.0 has test data released separately from the test data,<br />
which is reflected in our folder structure. This data was released for<br />
the CoNLL 2017 shared task.<br />
<br />
== UD version 2.1 ==<br />
<br />
folders:<br><br />
<code>/projects/nlpl/data/parsing/universal_dependencies/ud-treebanks-v2.1</code><br />
<br />
info:<br><br />
Version 2.1 treebanks are available at http://hdl.handle.net/11234/1-2515. <br><br />
102 treebanks, 60 languages, released November 15, 2017.<br />
<br />
== UD version 2.2 ==<br />
<br />
folders:<br><br />
<code>/projects/nlpl/data/parsing/universal_dependencies/ud-treebanks-v2.2</code><br />
<br />
info:<br><br />
Version 2.2 treebanks are available at http://hdl.handle.net/11234/1-2837. <br><br />
122 treebanks, 71 languages, released July 1, 2018.<br />
<br />
== UD version 2.3 ==<br />
<br />
folders:<br><br />
<code>/projects/nlpl/data/parsing/universal_dependencies/ud-treebanks-v2.3</code><br />
<br />
info:<br><br />
Version 2.2 treebanks are available at http://hdl.handle.net/11234/1-2895. <br><br />
129 treebanks, 76 languages, released November 15, 2018.<br />
<br />
= Contact =<br />
Joakim Nivre, Uppsala University<br/><br />
Sara Stymne, Uppsala University<br/><br />
firstname.lastname@lingfil.uu.se</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/uuparser&diff=452Parsing/uuparser2018-11-28T15:54:10Z<p>Nivre: </p>
<hr />
<div>= The Uppsala Parser =<br />
<br />
The Uppsala Parser is a neural transition-based dependency parser based on bist-parser by Eli Kiperwasser and Yoav Goldberg and developed primarily in the context of the CoNLL shared tasks on universal dependency parsing in 2017 and 2018. The Uppsala Parser is publicly available at https://github.com/UppsalaNLP/uuparser. <br />
Note that the version installed here may exhibit some slight differences, designed to improve ease of use.<br />
<br />
== Using the Uppsala Parser on Abel ==<br />
<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-uuparser<br />
<br />
== Training a parsing model ==<br />
<br />
To train a set of parsing models on treebanks from Universal Dependencies (v2.1):<br />
<br />
uuparser --include [languages to include denoted by their ISO id] --outdir my-output-directory<br />
<br />
If you want to quickly test the parser is correctly loaded and running without waiting for the full training procedure, add the <code>--mini</code> flag.<br />
For example:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --mini<br />
<br />
will train separate models for UD Swedish, English and Russian and store the results in the <code>experiments</code> folder in your home directory. The <code>--mini</code> flag tells the parser to train on just the first 150 sentences of each language, and evaluate on the first 100 sentences of development data. It also tells the parser to train for just 3 epochs, as opposed to the default 30 (see more below under "Options")<br />
<br />
Model selection is included in the training process by default; that is, at each epoch the current model is evaluated on the UD dev data, and at the end of training the best performing model for each language is selected. <br />
<br />
== Predicting with a pre-trained parsing model ==<br />
<br />
To predict on UD test data with the models trained above:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --predict<br />
<br />
You may again include the <code>--mini</code> flag if you prefer to test on a subset of 50 test sentences.<br />
<br />
== Options ==<br />
<br />
The parser has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
uuparser --help | less<br />
<br />
We recommend you set the <code>--dynet-mem</code> option to a larger number when running the full training procedure on larger treebanks. Commonly used values are 5000 and 10000 (in MB). Dynet is the neural network library on which the parser is built.<br />
<br />
Note that due to random initialization and other non-deterministic elements in the training process, you will not obtain the same results even when training twice under exactly the same circumstances (e.g. languages, number of epochs etc.). To ensure identical results between two runs, we recommend setting the <code>--dynet-seed</code> option to the same value both times (e.g. <code> --dynet-seed 123456789</code>) and adding the <code>--use-default-seed</code> flag. This ensures that Python's random number generator and Dynet both produce the same sequence of random numbers.<br />
<br />
== Training a multilingual parsing model ==<br />
<br />
Our parser supports multilingual parsing, that is a single parsing model for one or more languages. To train a ''multilingual'' model for the three languages in the examples above, we simply add the <code>--multilingual</code> flag when training<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000<br />
<br />
In this case, instead of creating three separate models in the language-specific subdirectories within ~/experiments, a single model will be created directly in this folder. Predicting on test data is then as easy as:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000 --predict<br />
<br />
Note that if you want to have different output directories for training and predicting, the <code>--modeldir</code> option can be specified when predicting to tell the parser where the pre-trained model can be found.<br />
<br />
== Segmentation ==<br />
<br />
In the above examples, we assume pre-segmented input data already in the [http://universaldependencies.org/format.html CONLL-U] format. If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using <code>module load nlpl-udpipe</code> and then run by typing <code>udpipe</code> at the command line, see below.</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/udpipe&diff=451Parsing/udpipe2018-11-28T15:51:18Z<p>Nivre: Created page with "= UDPipe = UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared ta..."</p>
<hr />
<div>= UDPipe =<br />
<br />
UDPipe is an end-to-end system for morphosyntactic parsing in the UD framework developed by Milan Straka. It was used as the baseline system in the CoNLL shared tasks on universal dependency parsing in 2017 and 2018.<br />
<br />
== Using UDPipe on Abel ==<br />
<br />
UDPipe is available as a module on Abel. It was installed as part of the OPUS activity.<br />
<br />
How to use UDPipe:<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-udpipe<br />
<br />
To learn more about using UDPipe, check the official [UDPipe User's Manual](http://ufal.mff.cuni.cz/udpipe/users-manual)</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/uuparser&diff=450Parsing/uuparser2018-11-28T15:49:01Z<p>Nivre: </p>
<hr />
<div>= The Uppsala Parser =<br />
<br />
The Uppsala Parser is a neural transition-based dependency parser based on bist-parser by Eli Kiperwasser and Yoav Goldberg and developed primarily in the context of the CoNLL shared tasks on universal dependency parsing in 2017 and 2018.<br />
<br />
== Using the Uppsala Parser on Abel ==<br />
<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-uuparser<br />
<br />
== Training a parsing model ==<br />
<br />
To train a set of parsing models on treebanks from Universal Dependencies (v2.1):<br />
<br />
uuparser --include [languages to include denoted by their ISO id] --outdir my-output-directory<br />
<br />
If you want to quickly test the parser is correctly loaded and running without waiting for the full training procedure, add the <code>--mini</code> flag.<br />
For example:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --mini<br />
<br />
will train separate models for UD Swedish, English and Russian and store the results in the <code>experiments</code> folder in your home directory. The <code>--mini</code> flag tells the parser to train on just the first 150 sentences of each language, and evaluate on the first 100 sentences of development data. It also tells the parser to train for just 3 epochs, as opposed to the default 30 (see more below under "Options")<br />
<br />
Model selection is included in the training process by default; that is, at each epoch the current model is evaluated on the UD dev data, and at the end of training the best performing model for each language is selected. <br />
<br />
== Predicting with a pre-trained parsing model ==<br />
<br />
To predict on UD test data with the models trained above:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --predict<br />
<br />
You may again include the <code>--mini</code> flag if you prefer to test on a subset of 50 test sentences.<br />
<br />
== Options ==<br />
<br />
The parser has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
uuparser --help | less<br />
<br />
We recommend you set the <code>--dynet-mem</code> option to a larger number when running the full training procedure on larger treebanks. Commonly used values are 5000 and 10000 (in MB). Dynet is the neural network library on which the parser is built.<br />
<br />
Note that due to random initialization and other non-deterministic elements in the training process, you will not obtain the same results even when training twice under exactly the same circumstances (e.g. languages, number of epochs etc.). To ensure identical results between two runs, we recommend setting the <code>--dynet-seed</code> option to the same value both times (e.g. <code> --dynet-seed 123456789</code>) and adding the <code>--use-default-seed</code> flag. This ensures that Python's random number generator and Dynet both produce the same sequence of random numbers.<br />
<br />
== Training a multilingual parsing model ==<br />
<br />
Our parser supports multilingual parsing, that is a single parsing model for one or more languages. To train a ''multilingual'' model for the three languages in the examples above, we simply add the <code>--multilingual</code> flag when training<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000<br />
<br />
In this case, instead of creating three separate models in the language-specific subdirectories within ~/experiments, a single model will be created directly in this folder. Predicting on test data is then as easy as:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000 --predict<br />
<br />
Note that if you want to have different output directories for training and predicting, the <code>--modeldir</code> option can be specified when predicting to tell the parser where the pre-trained model can be found.<br />
<br />
== Segmentation ==<br />
<br />
In the above examples, we assume pre-segmented input data already in the [http://universaldependencies.org/format.html CONLL-U] format. If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using <code>module load nlpl-udpipe</code> and then run by typing <code>udpipe</code> at the command line, see below.</div>Nivrehttp://wiki.nlpl.eu/index.php?title=Parsing/uuparser&diff=449Parsing/uuparser2018-11-28T15:46:54Z<p>Nivre: Created page with "= The Uppsala Parser = To use the Uppsala parser: * Log into Abel * Activate the NLPL module repository: module use -a /projects/nlpl/software/modulefiles/ * Load the most ..."</p>
<hr />
<div>= The Uppsala Parser =<br />
<br />
To use the Uppsala parser:<br />
<br />
* Log into Abel<br />
* Activate the NLPL module repository:<br />
module use -a /projects/nlpl/software/modulefiles/<br />
* Load the most recent version of the uuparser module:<br />
module load nlpl-uuparser<br />
<br />
== Training a parsing model ==<br />
<br />
To train a set of parsing models on treebanks from Universal Dependencies (v2.1):<br />
<br />
uuparser --include [languages to include denoted by their ISO id] --outdir my-output-directory<br />
<br />
If you want to quickly test the parser is correctly loaded and running without waiting for the full training procedure, add the <code>--mini</code> flag.<br />
For example:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --mini<br />
<br />
will train separate models for UD Swedish, English and Russian and store the results in the <code>experiments</code> folder in your home directory. The <code>--mini</code> flag tells the parser to train on just the first 150 sentences of each language, and evaluate on the first 100 sentences of development data. It also tells the parser to train for just 3 epochs, as opposed to the default 30 (see more below under "Options")<br />
<br />
Model selection is included in the training process by default; that is, at each epoch the current model is evaluated on the UD dev data, and at the end of training the best performing model for each language is selected. <br />
<br />
== Predicting with a pre-trained parsing model ==<br />
<br />
To predict on UD test data with the models trained above:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --predict<br />
<br />
You may again include the <code>--mini</code> flag if you prefer to test on a subset of 50 test sentences.<br />
<br />
== Options ==<br />
<br />
The parser has numerous options to allow you to fine-control its behaviour. For a full list, type<br />
<br />
uuparser --help | less<br />
<br />
We recommend you set the <code>--dynet-mem</code> option to a larger number when running the full training procedure on larger treebanks. Commonly used values are 5000 and 10000 (in MB). Dynet is the neural network library on which the parser is built.<br />
<br />
Note that due to random initialization and other non-deterministic elements in the training process, you will not obtain the same results even when training twice under exactly the same circumstances (e.g. languages, number of epochs etc.). To ensure identical results between two runs, we recommend setting the <code>--dynet-seed</code> option to the same value both times (e.g. <code> --dynet-seed 123456789</code>) and adding the <code>--use-default-seed</code> flag. This ensures that Python's random number generator and Dynet both produce the same sequence of random numbers.<br />
<br />
== Training a multilingual parsing model ==<br />
<br />
Our parser supports multilingual parsing, that is a single parsing model for one or more languages. To train a ''multilingual'' model for the three languages in the examples above, we simply add the <code>--multilingual</code> flag when training<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000<br />
<br />
In this case, instead of creating three separate models in the language-specific subdirectories within ~/experiments, a single model will be created directly in this folder. Predicting on test data is then as easy as:<br />
<br />
uuparser --include "sv en ru" --outdir ~/experiments --multiling --dynet-mem 5000 --predict<br />
<br />
Note that if you want to have different output directories for training and predicting, the <code>--modeldir</code> option can be specified when predicting to tell the parser where the pre-trained model can be found.<br />
<br />
== Segmentation ==<br />
<br />
In the above examples, we assume pre-segmented input data already in the [http://universaldependencies.org/format.html CONLL-U] format. If your input is raw text, we recommend using UDPipe to segment first. The UDPipe module can be loaded using <code>module load nlpl-udpipe</code> and then run by typing <code>udpipe</code> at the command line, see below.</div>Nivre