Infrastructure/resources

From Nordic Language Processing Laboratory
Jump to: navigation, search

Background

The NLPL initiative is supported by CSC and Sigma2, the Finnish and Norwegian national e-infrastructure providers, respectively. The project has received computing allocations on the Puhti (Finland; the successor to Taito) and Saga (Norway; successor to Abel) superclusters, which can be used by all project members. Additionally, moderate ‘on-line’ storage allocations are provided on both machines (accessible as /projappl/nlpl/ on Puhti and /cluster/shared/nlpl/ on Saga), as well as a more generous storage allocation of 50 terabytes (in mid-2017) on NIRD.

Gaining Access

For the time being, there is no synchronization of user accounts across countries. Thus, project members need to obtain accounts separately for Saga and Puhti. In mid-2017 at least, staff and students at project member sites (and associate partners) are welcome to use the NLPL virtual laboratory. (Sub-)Allocation decisions (within NLPL) are made by the infrastructure task force; thus, please make sure to make contact with us before submitting account applications.

For Saga, there is an on-line account application form at https://www.metacenter.no/user/application/. Please request association with Notur project NN9447K and suggest a plausible end date for your association with the project. It will usually take a day or two before account activation is complete, and you will receive status updates by email and text messages. Stephan Oepen (UiO) is the Norwegian point of contact for these projects, but please direct all inquiries to the NLPL infrastructure task force.

For Puhti, there is an on-line account application form at https://sui.csc.fi/web/guest/csc-customer-registration/. Martin Matthiesen will need to approve NLPL-related account requests and manages allocation ‘projects’ (one per site). Stephan Oepen manages access rights to the community directory on Puhti (‘/projappl/nlpl/’). For all technical questions, please contact the NLPL infrastructure task force.

Resource Allocations

At present (mid-2019), there are different allocation mechanisms for the two systems.

For Abel (and Saga), allocations are made by the Norwegian Resource Allocation Committee for six-month periods, which start on April 1 and October 1, each year. NLPL received an allocation of 500,000 cpu hours for the allocation period 2016.2; of these, less than one third had been used by March 2017, and the allocation of period 2017.1 (lasting until the end of September 2017) was reduced (from the original estimates, in mutual agreement) to a fresh 500,000 hours. Towards the end of August 2017, it appears that NLPL usage on Abel actually has declined to some 50,000 hours in allocation period 2017.1; Stephan Oepen will optimistically request 200,000 hours for the next six-month period. All NLPL project members share these allocations and, over time, will need to find ways of using and distributing these resources fairly.

For Taito (and Puhti), each NLPL member site can be allocated its ‘own’ project, and CSC tends to make smaller allocations more frequently. The Steering Group has yet to develop a policy for how to make sure that the sum of these micro-allocations fit (fairly) within the bounds of the ‘blanket’ allocation of three million core hours per year granted to NLPL as the CSC in-kind contribution, but so far this has not been a practical concern.

Taito Usage Statistics

Id Project Principal Investigator Users 2017 Units 2017 Users 2018 Units 2018 Users 2019 Units 2019
2000509 Deep Learning for Natural Language Processing Joakim Nivre 7 531,099 10 609,617
2000582 Neic-NLPL Stephan Oepen - - 3 62,298
2000661 NLPL-OPUS Jörg Tiedemann 1 28,959 2 27,0915
2000288 BAULT Jörg Tiedemann 1 992 1 7,089
2000309 CrossNLP Jörg Tiedemann 10 84,4404 15 1,808,416
tuy4622 Textual Data Mining for Bioinformation Management Filip Ginter 9 891,837 10 558,113
2000391 TurkuNLP EDU Filip Ginter 1 142,808 2 177,932
2000989 UCPH part of NeIC-NLPL Anders Søgaard - - 5 107,657
2001006 NLPL-ITUNLP Leon Derczynski - - 7 345
Total 26 2,440,099 47 3,602,382

Abel Usage Statistics

Allocations (and statistics) on Abel are organized into six-months periods (called, for example, 2018.1), which start in April and October each year. The table below shows the sum of hours for two allocation periods in a given ‘year’ (e.g. 2016.2 and 2017.1 for the first NLPL project year), and the maximum count of active users across the two periods.

Id Project Principal Investigator Users 2017 Hours 2017 Users 2018 Hours 2018 Users 2019 Hours 2019
NN9447K NLPL Stephan Oepen 7 202,080 16 203,590 41 960,000
NN9107K DELPH-IN21 Stephan Oepen 8 1,981,377 10 214,644
Total 15 2,183,457 26 418,234 41 960,000