Difference between revisions of "Eosc/horovod"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Four Weeks Later: Performance Optimization)
(Reflections: Responsibilities)
Line 60: Line 60:
  
 
= Reflections: Responsibilities =
 
= Reflections: Responsibilities =
 +
 +
The software stack sketched in the above combines a large number of modules.
 +
These can be categorized into three distinct layers, according to how closely
 +
they are tied to a specific HPC system and to what degree they serve specific
 +
subsets of users.
 +
In broad terms, these can be characterized as (a) core, (b) intermediate, or
 +
(c) custom modules.
  
 
Core components like the C++ tool chain, CUDA and associated extensions, different
 
Core components like the C++ tool chain, CUDA and associated extensions, different
Line 68: Line 75:
 
disciplines or user groups.
 
disciplines or user groups.
  
A similar argument could be made for general-purpose DL frameworks like
+
Although less intricately tied to the host environment, general-purpose DL frameworks
PyTorch and TensorFlow, but
+
are not discipline-specific either and could in principle be considered part of the
 +
core software inventory provided with each HPC system.
 +
However, both PyTorch and TensorFlow provide optional and contributed extensions that
 +
are tailored for NLP (e.g. <code>torchtext</code>, <code>keras-preprocessing</code>,
 +
or the contributed CRF implementation in TensorFlow).
 +
Another deep learning framework ([https://github.com/clab/dynet DyNet]) appears
 +
near-exclusively used for natural language processing.
 +
Also, because the NLPL community in recent years appears to have been the maybe
 +
most active user community for these frameworks on the Abel and Taito systems, it
 +
has been both convenient and efficient for the community to maintain its own
 +
installations of these DL frameworks.
 +
 
 +
Finally, tools and libraries like Gensim, spaCy, and others are specific to NLP and
 +
are in some cases co-developed by NLPL community members.
 +
There are about two dozens such tools that are widely used, i.e. will be required
 +
by multiple users.
 +
Some are easy to install, others less so; often, these tools provide pre-trained
 +
modules (for a variety of languages) that need to be installed separately and can
 +
be somewhat space-consuming.
 +
These tools and their associated data should not be installed (redundantly) by users,
 +
but rather motivate the community ‘self-help’ approach of the NLPL virtual
 +
laboratory: expert members of the community are best positioned to install these
 +
tools and assist others in using them effectively.

Revision as of 22:11, 1 February 2020

Background

To evaluate different approaches to provisioning software collections to NLPL users in a uniform manner across different systems, this page provides a high-level description of a current request (in early 2020). This software goal shall be discussed from various perspectives, including different approaches to software provisioning, interfaces to the host system and 'core' software modules, and requirements on the target compute systems (e.g. support for EasyBuild or container creation and execution).

High-Level Goal: Multi-GPU Training

Building on top of either PyTorch or TensorFlow, multiple NLPL users want to train models that require running on multiple gpus (8 or 16, say; i.e. multiple nodes) for at least several days. One fashionable framework these days is Horovod, which among other things combines MPI and NCCL for multi-gpu and multi-node communication.

Providing a functional and effective Horovod installation is no small feast, and even the more technically sophisticated NLPL users may struggle with putting together all the right pieces. Instead, the following components should be pre-installed in the NLPL virtual laboratory, for users to activate and run with minimal effort: Horovod 0.19.0, with either PyTorch 1.4.0 or TensorFlow 1.15.2 as its deep learning (DL) backend, all in a Python 3.7.2 environment. Horovod has a number of dependencies, including a basic tool chain, NCCL 2 and a suitable MPI implementation (e.g. OpenMPI 3.1.2 or 4.0.0, but not 3.1.3). The DL backends each bring their own set of dependencies, currently among other things CUDA 10.0 and cuDNN 7.6 for TensorFlow, and CUDA 10.1 for PyTorch. Additionally, irrespective of the choice of DL framework, users will need a range of discipline-specific Python add-ons, say Gensim 3.8.1 and spaCy 2.2.3 (including all its pre-trained models). It is required that all of these components can be loaded into one Python universe, i.e. the same process.

Two Weeks Later: Gensim Updates

At last, Gensim 4.0 is released. Two of the most active Horovod users (one using PyTorch, the other a loyal TensorFlow user) desperately want to update their software environment, keeping everything as before but swapping out the older version of Gensim for the new release. But Gensim 4.0 does not maintain backwards compatibility with the 3.x releases, hence other NLPL users plead to not have their software environment altered until after they have submitted their doctoral theses.

Four Weeks Later: Performance Optimization

A team of NLPL developers sets out to adapt for Norwegian a large-scale experiment originally published by Google AI researchers, who report that they cumulatively used two TPU years (two weeks of exclusive access to 64 units) on this computation. From colleagues in Finland, they understand that running Horovod on top of the Intel MPI Library (instead of OpenMPI) can give a performance gain of up to 40 percent. Reliable Intel MPI utilization, however, requires installation of the most recent Horovod 0.20.0 release candidate (while keeping TensorFlow, Gensim, spaCy, et al. versions as before).

The Inevitable: Backgrading

At about the same time, a new NLPL user reaches out because they fail to get their code running in TensorFlow 1.15 and Python 3.7 (for reasons beyond their control). They ask for an installation of TensorFlow 1.12 in a Python 3.5 environment. They can make do without Horovod, but they badly need Gensim and spaCy.

Reflections: Responsibilities

The software stack sketched in the above combines a large number of modules. These can be categorized into three distinct layers, according to how closely they are tied to a specific HPC system and to what degree they serve specific subsets of users. In broad terms, these can be characterized as (a) core, (b) intermediate, or (c) custom modules.

Core components like the C++ tool chain, CUDA and associated extensions, different MPI implementations, or just a vanilla Python 3.7 interpreter arguably should be installed and maintained by the system administrators. They can be intricately linked to available hardware (cpu and gpu types, and the available interconnect) and should be expertly supported irrespective of individual disciplines or user groups.

Although less intricately tied to the host environment, general-purpose DL frameworks are not discipline-specific either and could in principle be considered part of the core software inventory provided with each HPC system. However, both PyTorch and TensorFlow provide optional and contributed extensions that are tailored for NLP (e.g. torchtext, keras-preprocessing, or the contributed CRF implementation in TensorFlow). Another deep learning framework (DyNet) appears near-exclusively used for natural language processing. Also, because the NLPL community in recent years appears to have been the maybe most active user community for these frameworks on the Abel and Taito systems, it has been both convenient and efficient for the community to maintain its own installations of these DL frameworks.

Finally, tools and libraries like Gensim, spaCy, and others are specific to NLP and are in some cases co-developed by NLPL community members. There are about two dozens such tools that are widely used, i.e. will be required by multiple users. Some are easy to install, others less so; often, these tools provide pre-trained modules (for a variety of languages) that need to be installed separately and can be somewhat space-consuming. These tools and their associated data should not be installed (redundantly) by users, but rather motivate the community ‘self-help’ approach of the NLPL virtual laboratory: expert members of the community are best positioned to install these tools and assist others in using them effectively.