Difference between revisions of "Parsing/ud"
(Created page with "= Universal Dependencies = For syntactic parsing experiments we provide data from the [http://universaldependencies.org/ Universal Dependencies (UD) project] for a high numbe...") |
(→Universal Dependencies) |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
= Universal Dependencies = | = Universal Dependencies = | ||
− | For syntactic parsing experiments we provide data from the [http://universaldependencies.org/ Universal Dependencies (UD) project] for a high number of languages. The data is provided in v2.0, which was used for the CoNLL shared task 2017, v2.1, v2.2, which was used for the CoNLL shared task 2018, and v2. | + | For syntactic parsing experiments we provide data from the [http://universaldependencies.org/ Universal Dependencies (UD) project] for a high number of languages. The data is provided in v2.0, which was used for the CoNLL shared task 2017, v2.1, v2.2, which was used for the CoNLL shared task 2018, v2.3, v2.4, and v2.5. |
+ | |||
+ | All data is available on Saga at <code>/cluster/shared/nlpl/data/parsing/ud</code> and automatically | ||
+ | [http://wiki.nlpl.eu/index.php/Infrastructure/replication replicated] to Puhti into <code>/projappl/nlpl/data/parsing/ud</code> | ||
+ | (below, folder names abbreviate the site-specific prefix to the NLPL community directory as just <code>...</code>). | ||
− | |||
== UD version 2.0 == | == UD version 2.0 == | ||
folders:<br> | folders:<br> | ||
− | <code> | + | <code>.../nlpl/data/parsing/ud/ud-treebanks-v2.0-conll2017</code><br> |
− | <code> | + | <code>.../nlpl/data/parsing/ud/ud-test-v2.0-conll2017</code> |
info:<br> | info:<br> | ||
Line 24: | Line 27: | ||
folders:<br> | folders:<br> | ||
− | <code> | + | <code>.../nlpl/data/parsing/ud/ud-treebanks-v2.1</code> |
info:<br> | info:<br> | ||
Line 33: | Line 36: | ||
folders:<br> | folders:<br> | ||
− | <code> | + | <code>.../nlpl/data/parsing/ud/ud-treebanks-v2.2</code> |
info:<br> | info:<br> | ||
Line 42: | Line 45: | ||
folders:<br> | folders:<br> | ||
− | <code> | + | <code>.../nlpl/data/parsing/ud/ud-treebanks-v2.3</code> |
info:<br> | info:<br> | ||
Version 2.2 treebanks are available at http://hdl.handle.net/11234/1-2895. <br> | Version 2.2 treebanks are available at http://hdl.handle.net/11234/1-2895. <br> | ||
129 treebanks, 76 languages, released November 15, 2018. | 129 treebanks, 76 languages, released November 15, 2018. | ||
+ | |||
+ | |||
+ | == UD version 2.4 == | ||
+ | |||
+ | folders:<br> | ||
+ | <code>.../nlpl/data/parsing/ud/ud-treebanks-v2.4</code> | ||
+ | |||
+ | info:<br> | ||
+ | Version 2.4 treebanks are available at http://hdl.handle.net/11234/1-2988. <br> | ||
+ | 146 treebanks, 83 languages, released May 15, 2019.<br> | ||
+ | |||
+ | |||
+ | == UD version 2.5 == | ||
+ | |||
+ | folders:<br> | ||
+ | <code>.../nlpl/data/parsing/ud/ud-treebanks-v2.5</code> | ||
+ | |||
+ | info:<br> | ||
+ | Version 2.5 treebanks are available at http://hdl.handle.net/11234/1-3105. <br> | ||
+ | 157 treebanks, 90 languages, released November 15, 2019.<br> | ||
= Contact = | = Contact = |
Latest revision as of 23:53, 20 December 2019
Contents
Universal Dependencies
For syntactic parsing experiments we provide data from the Universal Dependencies (UD) project for a high number of languages. The data is provided in v2.0, which was used for the CoNLL shared task 2017, v2.1, v2.2, which was used for the CoNLL shared task 2018, v2.3, v2.4, and v2.5.
All data is available on Saga at /cluster/shared/nlpl/data/parsing/ud
and automatically
replicated to Puhti into /projappl/nlpl/data/parsing/ud
(below, folder names abbreviate the site-specific prefix to the NLPL community directory as just ...
).
UD version 2.0
folders:
.../nlpl/data/parsing/ud/ud-treebanks-v2.0-conll2017
.../nlpl/data/parsing/ud/ud-test-v2.0-conll2017
info:
Version 2.0 treebanks, archived at http://hdl.handle.net/11234/1-1983.
70 treebanks, 50 languages, released March 1, 2017.
Test data 2.0 are archived at http://hdl.handle.net/11234/1-2184.
81 treebanks, 49 languages, released May 18, 2017.
Release 2.0 has test data released separately from the test data, which is reflected in our folder structure. This data was released for the CoNLL 2017 shared task.
UD version 2.1
folders:
.../nlpl/data/parsing/ud/ud-treebanks-v2.1
info:
Version 2.1 treebanks are available at http://hdl.handle.net/11234/1-2515.
102 treebanks, 60 languages, released November 15, 2017.
UD version 2.2
folders:
.../nlpl/data/parsing/ud/ud-treebanks-v2.2
info:
Version 2.2 treebanks are available at http://hdl.handle.net/11234/1-2837.
122 treebanks, 71 languages, released July 1, 2018.
UD version 2.3
folders:
.../nlpl/data/parsing/ud/ud-treebanks-v2.3
info:
Version 2.2 treebanks are available at http://hdl.handle.net/11234/1-2895.
129 treebanks, 76 languages, released November 15, 2018.
UD version 2.4
folders:
.../nlpl/data/parsing/ud/ud-treebanks-v2.4
info:
Version 2.4 treebanks are available at http://hdl.handle.net/11234/1-2988.
146 treebanks, 83 languages, released May 15, 2019.
UD version 2.5
folders:
.../nlpl/data/parsing/ud/ud-treebanks-v2.5
info:
Version 2.5 treebanks are available at http://hdl.handle.net/11234/1-3105.
157 treebanks, 90 languages, released November 15, 2019.
Contact
Joakim Nivre, Uppsala University
Sara Stymne, Uppsala University
firstname.lastname@lingfil.uu.se