Difference between revisions of "Home"
(→Associates) |
(→Resources) |
||
(17 intermediate revisions by 3 users not shown) | |||
Line 2: | Line 2: | ||
university research groups in Natural Language Processing (NLP) in Northern Europe. | university research groups in Natural Language Processing (NLP) in Northern Europe. | ||
Our vision is to implement a virtual laboratory for large-scale NLP research by | Our vision is to implement a virtual laboratory for large-scale NLP research by | ||
− | (a) creating new ways to enable data- and compute-intensive Natural Language | + | (a) creating new ways to enable data- and compute-intensive Natural Language Processing research by implementing a common software, data and service stack in multiple Nordic HPC centres, |
(b) by pooling competencies within the user community and among expert support teams, | (b) by pooling competencies within the user community and among expert support teams, | ||
and (c) by enabling internationally competitive, data-intensive research and experimentation | and (c) by enabling internationally competitive, data-intensive research and experimentation | ||
on a scale that would be difficult to sustain on commodity computing resources. | on a scale that would be difficult to sustain on commodity computing resources. | ||
+ | |||
+ | |||
= Activities = | = Activities = | ||
+ | |||
+ | <br/> | ||
+ | [[File:neic.png|center]] | ||
+ | <br/><br/> | ||
As part of its ‘virtual laboratory’, NLPL prepares and maintains | As part of its ‘virtual laboratory’, NLPL prepares and maintains | ||
Line 21: | Line 27: | ||
Please see the [http://wiki.nlpl.eu/index.php/Infrastructure/software/catalogue catalogue of available software] | Please see the [http://wiki.nlpl.eu/index.php/Infrastructure/software/catalogue catalogue of available software] | ||
and above links for information on how to gain access to and utilize the NLPL virtual laboratory. | and above links for information on how to gain access to and utilize the NLPL virtual laboratory. | ||
+ | |||
+ | = Resources = | ||
Since mid-2017, NLPL has started to make available some of its resources and services to the public: | Since mid-2017, NLPL has started to make available some of its resources and services to the public: | ||
− | * [http:// | + | * The [http://opus.nlpl.eu Open Parallel Corpus] (OPUS; now maintained as a dedicated service under the NLPL umbrella); |
− | * The [http://corpora.nlpl.eu/engc3/ EngC3] corpus of some [http://corpora.nlpl.eu/engc3/ 130 billion tokens of clean English text] | + | * [http://wiki.nlpl.eu/index.php/Corpora/home 90 billion tokens of ‘raw’ text] extracted from web data, covering the 45 languages in the 2017 UD Parsing Shared Task; |
− | * | + | * The [http://corpora.nlpl.eu/engc3/ EngC3] corpus of some [http://corpora.nlpl.eu/engc3/ 130 billion tokens of clean English text] extracted from the Common Crawl; |
+ | * Large-Scale [http://wiki.nlpl.eu/Vectors/norlm contextualized language models] for Norwegian, including the ELMo and BERT architectures; | ||
* A [http://vectors.nlpl.eu/repository repository of pre-trained word embeddings] on very large corpora and [http://vectors.nlpl.eu/explore on-line explorer] for these models; | * A [http://vectors.nlpl.eu/repository repository of pre-trained word embeddings] on very large corpora and [http://vectors.nlpl.eu/explore on-line explorer] for these models; | ||
− | * The [http:// | + | * The [http://epe.nlpl.eu Extrinsic Parser Evaluation] (EPE) Shared Tasks at the IWPT 2017 and CoNLL 2018 conferences; |
* An annual [http://wiki.nlpl.eu/index.php/Community/training winter school series] on machine learning and scientific programming for NLP research. | * An annual [http://wiki.nlpl.eu/index.php/Community/training winter school series] on machine learning and scientific programming for NLP research. | ||
Line 40: | Line 49: | ||
academic partners. | academic partners. | ||
− | Between 2017 and | + | Between 2017 and 2019, NLPL was supported by the [https://neic.nordforsk.org/ Nordic e-Infrastructure Collaboration] |
(NeIC) and the national e-Infrastructure providers in Finland ([http://www.csc.fi CSC]) and Norway ([https://www.sigma2.no/ Sigma2]). | (NeIC) and the national e-Infrastructure providers in Finland ([http://www.csc.fi CSC]) and Norway ([https://www.sigma2.no/ Sigma2]). | ||
+ | Starting in 2020, the Oslo and Helsinki groups represent the NLPL community in the | ||
+ | [https://www.eosc-nordic.eu/ EOSC-Nordic initiative], and the partners jointly continue development | ||
+ | and maintenance of the NLPL virtual laboratory. | ||
= Associates = | = Associates = | ||
Line 51: | Line 63: | ||
anticipated group of users (including details on affiliation). | anticipated group of users (including details on affiliation). | ||
− | As of | + | As of November 2019, the following research groups are NLPL associates: |
* [https://clasp.gu.se/ Center for Linguistic Theory and Studies of Probability] at Gothenburg University (Sweden) | * [https://clasp.gu.se/ Center for Linguistic Theory and Studies of Probability] at Gothenburg University (Sweden) | ||
* [https://www.ling.su.se/english/nlp Section for Computational Linguistics] at Stockholm University (Sweden) | * [https://www.ling.su.se/english/nlp Section for Computational Linguistics] at Stockholm University (Sweden) | ||
+ | * [https://nlp.cs.ut.ee/ Natural Language Processing Research Group] at the University of Tartu (Estonia) | ||
+ | * [https://www.ida.liu.se/divisions/hcs/nlplab/ Research Group on Natural Language Processing (NLPLAB)] at Linköping University | ||
+ | * The [https://arnastofnun.is/is/maltaekni Árni Magnússon Institute for Icelandic Studies] and the [https://lvl.ru.is/ Language and Voice Lab] at Reykjavik University | ||
= Contact = | = Contact = | ||
− | To email NLPL project management and its Steering Group, please use the address <code> | + | To email NLPL project management and its Steering Group, please use the address <code>infrastructure</code><code>@</code><code>nlpl.eu</code>. |
In mid-2017, the project welcomes expressions of interest from additional NLP research groups in Northern Europe. | In mid-2017, the project welcomes expressions of interest from additional NLP research groups in Northern Europe. | ||
For additional background and the archive of official project documents (including the work plan and Steering Group minutes), please | For additional background and the archive of official project documents (including the work plan and Steering Group minutes), please | ||
see the [https://wiki.neic.no/wiki/Nordic_language_processing_laboratory NLPL page on the NeIC wiki]. | see the [https://wiki.neic.no/wiki/Nordic_language_processing_laboratory NLPL page on the NeIC wiki]. |
Latest revision as of 18:42, 12 January 2021
The Nordic Language Processing Laboratory (NLPL) is a collaboration of university research groups in Natural Language Processing (NLP) in Northern Europe. Our vision is to implement a virtual laboratory for large-scale NLP research by (a) creating new ways to enable data- and compute-intensive Natural Language Processing research by implementing a common software, data and service stack in multiple Nordic HPC centres, (b) by pooling competencies within the user community and among expert support teams, and (c) by enabling internationally competitive, data-intensive research and experimentation on a scale that would be difficult to sustain on commodity computing resources.
Activities
As part of its ‘virtual laboratory’, NLPL prepares and maintains software and data infrastructures for (A) Collaboration and Software Management; (B) Statistical and Neural Machine Translation; (C) Data-Driven Dependency Parsing; (D) Very Large Corpora; (E) Pre-Trained Word Embeddings; (F) Automated Extrinsic Evaluation; (G) Parallel Corpora and OPUS; and (H) Community Formation and Outreach. Please see the catalogue of available software and above links for information on how to gain access to and utilize the NLPL virtual laboratory.
Resources
Since mid-2017, NLPL has started to make available some of its resources and services to the public:
- The Open Parallel Corpus (OPUS; now maintained as a dedicated service under the NLPL umbrella);
- 90 billion tokens of ‘raw’ text extracted from web data, covering the 45 languages in the 2017 UD Parsing Shared Task;
- The EngC3 corpus of some 130 billion tokens of clean English text extracted from the Common Crawl;
- Large-Scale contextualized language models for Norwegian, including the ELMo and BERT architectures;
- A repository of pre-trained word embeddings on very large corpora and on-line explorer for these models;
- The Extrinsic Parser Evaluation (EPE) Shared Tasks at the IWPT 2017 and CoNLL 2018 conferences;
- An annual winter school series on machine learning and scientific programming for NLP research.
Partners
The NLPL consortium is comprised of Nordic research groups in NLP and the national e-infrastructure providers of Finland and Norway: Helsinki University (Finland), IT University Copenhagen (Denmark), University of Copenhagen (Denmark), University of Oslo (Norway), Turku University (Finland), and Uppsala University (Sweden) are the academic partners.
Between 2017 and 2019, NLPL was supported by the Nordic e-Infrastructure Collaboration (NeIC) and the national e-Infrastructure providers in Finland (CSC) and Norway (Sigma2). Starting in 2020, the Oslo and Helsinki groups represent the NLPL community in the EOSC-Nordic initiative, and the partners jointly continue development and maintenance of the NLPL virtual laboratory.
Associates
NLPL welcomes involvement of additional research groups in Language Technology in the Nordics, including the Baltic region, to make use of the virtual laboratory. The project has established an associate program where users can get access to NLPL resources. Please email the contact address below to ask for access. As part of your initial contact, please provide an indication of the expected types of computing, software, and data to be used and the anticipated group of users (including details on affiliation).
As of November 2019, the following research groups are NLPL associates:
- Center for Linguistic Theory and Studies of Probability at Gothenburg University (Sweden)
- Section for Computational Linguistics at Stockholm University (Sweden)
- Natural Language Processing Research Group at the University of Tartu (Estonia)
- Research Group on Natural Language Processing (NLPLAB) at Linköping University
- The Árni Magnússon Institute for Icelandic Studies and the Language and Voice Lab at Reykjavik University
Contact
To email NLPL project management and its Steering Group, please use the address infrastructure
@
nlpl.eu
.
In mid-2017, the project welcomes expressions of interest from additional NLP research groups in Northern Europe.
For additional background and the archive of official project documents (including the work plan and Steering Group minutes), please see the NLPL page on the NeIC wiki.