Corpora/home

From Nordic Language Processing Laboratory
(Difference between revisions)
Jump to: navigation, search
(Background)
(Corpus Catalogue)
Line 16: Line 16:
 
| conll17 || || Taito || November 2017 || Jenna Kaverna
 
| conll17 || || Taito || November 2017 || Jenna Kaverna
 
|}
 
|}
 +
 +
 +
= Many Languages: The CoNLL 2017 Text Collection (Turku) =
 +
 +
 +
 +
= English: 130 Billion Words Extracted from the Common Crawl (Oslo) =
 +
 +
= English: Two Variants of Text Extraction from Wikipedia (Oslo) =

Revision as of 14:07, 25 January 2018

Contents

Background

NLPL creates and makes available various very large collections of textual data, for example drawing on Wikipedia and the Common Crawl.

They are available from the connected infrastructure. Please, check the individual pages of each resource.

Corpus Catalogue

Directory Description System Install Date Maintainer
conll17 Taito November 2017 Jenna Kaverna


Many Languages: The CoNLL 2017 Text Collection (Turku)

English: 130 Billion Words Extracted from the Common Crawl (Oslo)

English: Two Variants of Text Extraction from Wikipedia (Oslo)

Personal tools
Namespaces

Variants
Actions
Navigation
Tools