Eosc/nuggets

From Nordic Language Processing Laboratory
Jump to: navigation, search

Background

Smaller, tangible activities to be implemented under the EOSC-Nordic umbrella, in support of the NLPL virtual laboratory.


User Management

The project maintains an email list for all users with access to the NLPL virtual laboratory. We need to extract programmatically, on Saga and Puhti, which users are associated with any of the NLPL billing project(s), which should be read from a local file. This mechanism should be run as cron(8) job every night and rewrite another local file with active email addresses (which is then used at NLPL headquarters to automatically update the set of mailing list subscribers).

File System Management

With multiple users maintaining a shared collection of directories and files (aka the community directory, the shared software and data installations that make up the core of the virtual laboratory), we frequently run into permission challenges that at times are tedious to resolve or require administrator priviliges (which the NLPL developers cannot have). We would like to see an interface where a cron(8) job regularly reads a specification of permissions and such from a file (maintained by NLPL), quality-controls the contents, and then executes corresponding actions, e.g. something like

software/modules/etc/nlpl/tmp     rm -r
software/modules/                 chgrp -R nlpl
software/modules/                 chmod g+rwX,o+rX

Module Usage Statistics

On Abel, we used to have a crude mechanism tracking the use of NLPL software modules, viz. generating a log entry every time a module definition was loaded. This was accomplished by calling out to the logger(1) utility. To generalize this approach, we would minimally need a mechanism for logging remotely to a central NLPL server.

Accounting and Allocations

Each country has its own mechanisms for user management, allocations, and accounting. In Norway, currently all NLPL users are organized in a single project (for allocation), making it difficult for the project manager to keep track of users, including their affiliations and period of association with the project. Project association and access to accounting information in both Finland and Norway is through in-house web services. Could one imagine building an API-like abstraction over the national services, e.g. a RESTful interface to retrieve usage statistics for a project or individual user in a specific period?

At present, there appears to be no systematic distinction for accounting purposes between cpu vs. gpu usage (at least not in Norway).


Software Management

We are still struggling to provide fully uniform software environments for NLPL users on Puhti and Saga (the current two instances of the virtual laboratory). Assuming we continue to build on the software modules infrastructure, two challenges are: (a) the inventory of basic building blocks differs (e.g. compilers, Python, CUDA, MPI) and (b) installation of add-on modules is only partially automated.

Could one agree with CSC and Sigma2 on a standardized inventory of basic modules and a common mechanism to build on top of that, e.g. either EasyBuild or Spack? since the spring of 2020, USIT and UiO-Inf have started to evaluate EasyBuild for that purpose.