Difference between revisions of "Eosc/nuggets"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Module Usage Statistics)
(Accounting and Allocations)
(One intermediate revision by the same user not shown)
Line 33: Line 33:
 
= Software Management =
 
= Software Management =
  
 +
We are still struggling to provide fully uniform software environments
 +
for NLPL users on Puhti and Saga (the current two instances of the
 +
virtual laboratory).
 +
Assuming we continue to build on the software modules infrastructure,
 +
two challenges are: (a) the inventory of basic building blocks differs
 +
(e.g. compilers, Python, CUDA, MPI) and (b) installation of add-on
 +
modules is only partially automated.
 +
 +
Could one agree with CSC and Sigma2 on a standardized inventory of
 +
basic modules and a common mechanism to build on top of that, e.g.
 +
either EasyBuild or Spack?
  
 
= Module Usage Statistics =
 
= Module Usage Statistics =
Line 43: Line 54:
  
 
= Accounting and Allocations =
 
= Accounting and Allocations =
 +
 +
Each country has its own mechanisms for user management, allocations,
 +
and accounting.
 +
In Norway, currently all NLPL users are organized in a single project
 +
(for allocation), making it difficult for the project manager to keep
 +
track of users, including their affiliations and period of association
 +
with the project.
 +
Project association and access to accounting information in both
 +
Finland and Norway is through in-house web services.
 +
Could one imagine building an API-like abstraction over the national
 +
services, e.g. a RESTful interface to retrieve usage statistics for a
 +
project or individual user in a specific period?
 +
 +
At present, there appears to be no systematic distinction for accounting
 +
purposes between cpu vs. gpu usage.

Revision as of 17:47, 28 January 2020

Background

Smaller, tangible activities to be implemented under the EOSC-Nordic umbrella, in support of the NLPL virtual laboratory.


User Management

The project maintains an email list for all users with access to the NLPL virtual laboratory. We need to extract programmatically, on Saga and Puhti, which users are associated with any of the NLPL billing project(s), which should be read from a local file. This mechanism should be run as cron(8) job every night and rewrite another local file with active email addresses (which is then used at NLPL headquarters to automatically update the set of mailing list subscribers).

File System Management

With multiple users maintaining a shared collection of directories and files (aka the community directory, the shared software and data installations that make up the core of the virtual laboratory), we frequently run into permission challenges that at times are tedious to resolve or require administrator priviliges (which the NLPL developers cannot have). We would like to see an interface where a cron(8) job regularly reads a specification of permissions and such from a file (maintained by NLPL), quality-controls the contents, and then executes corresponding actions, e.g. something like

software/modules/etc/nlpl/tmp     rm -r
software/modules/                 chgrp -R nlpl
software/modules/                 chmod g+rwX,o+rX

Software Management

We are still struggling to provide fully uniform software environments for NLPL users on Puhti and Saga (the current two instances of the virtual laboratory). Assuming we continue to build on the software modules infrastructure, two challenges are: (a) the inventory of basic building blocks differs (e.g. compilers, Python, CUDA, MPI) and (b) installation of add-on modules is only partially automated.

Could one agree with CSC and Sigma2 on a standardized inventory of basic modules and a common mechanism to build on top of that, e.g. either EasyBuild or Spack?

Module Usage Statistics

On Abel, we used to have a crude mechanism tracking the use of NLPL software modules, viz. generating a log entry every time a module definition was loaded. This was accomplished by calling out to the logger(1) utility. To generalize this approach, we would minimally need a mechanism for logging remotely to a central NLPL server.

Accounting and Allocations

Each country has its own mechanisms for user management, allocations, and accounting. In Norway, currently all NLPL users are organized in a single project (for allocation), making it difficult for the project manager to keep track of users, including their affiliations and period of association with the project. Project association and access to accounting information in both Finland and Norway is through in-house web services. Could one imagine building an API-like abstraction over the national services, e.g. a RESTful interface to retrieve usage statistics for a project or individual user in a specific period?

At present, there appears to be no systematic distinction for accounting purposes between cpu vs. gpu usage.