Difference between revisions of "Eosc/pretraining/nvidia"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Setting things up)
(Software Installation)
Line 25: Line 25:
 
* It will create the file SETUP.local with the settings you are going to use in the future (by simply running '''source SETUP.local''' after loading EasyBuild).
 
* It will create the file SETUP.local with the settings you are going to use in the future (by simply running '''source SETUP.local''' after loading EasyBuild).
 
* It will also print a command for your users to run if they want to be able to load your custom modules.
 
* It will also print a command for your users to run if they want to be able to load your custom modules.
* Once you have created your modules (see below), they will be visible along with the system-provided ones via '''module avail''', and can be loaded with '''module load'''.
+
 
 +
==Building software and installing custom modules ==
 +
* Load EasyBuild (e.g., '''module load EasyBuild/4.3.0''')
 +
* Load your settings: '''source SETUP.local'''
 +
* Check your settings: '''eb --show-config'''
 +
* Imitate NVIDIA BERT installation by running
 +
'''eb --robot NLPL-nvidia_BERT_TF-20.06.08-gomkl-2019b-Python3.7.4.eb --dry-run'''
 +
* EasyBuild will show the list of required modules, marking those which have to be installed from scratch (by downloading and building the corresponding software).
 +
* If no warning or errors were shown, build everything required by NVIDIA BERT:
 +
'''eb --robot NLPL-nvidia_BERT_TF-20.06.08-gomkl-2019b-Python3.7.4.eb'''
 +
* Once you have created your modules, they will be visible along with the system-provided ones via '''module avail''', and can be loaded with '''module load'''.
  
 
= Data Preparation =
 
= Data Preparation =
  
 
= Training Example =
 
= Training Example =

Revision as of 23:45, 28 November 2020

Background

This page provides a recipe to large-scale pre-training of a BERT neural language model, using the high-efficiency NVIDIA BERT implementation (which is based on TensorFlow, in contrast to the NVIDIA Megatron code).

Software Installation

Prerequisites

We assume that EasyBuild and Lmod are already installed on the host machine.

We also assume that core software (compilers, most toolchains, CUDA drivers, etc) are also already installed system-wide, or at least that their easyconfigs are available to the system-wide EasyBuild installation.

Finally, the host machine must have Internet connection.

Setting things up

  • Clone our repository: git clone https://source.coderefinery.org/nlpl/easybuild.git --branch ak-dev --single-branch
  • Its directory ('easybuild') will serve as your building factory. Rename it to whatever you think fits well. Change to this directory.
  • To use the same procedure across different systems we provide a custom preparation script.
  • To prepare directories and get the path settings run it:

./setdir.sh

  • It will create the file SETUP.local with the settings you are going to use in the future (by simply running source SETUP.local after loading EasyBuild).
  • It will also print a command for your users to run if they want to be able to load your custom modules.

Building software and installing custom modules

  • Load EasyBuild (e.g., module load EasyBuild/4.3.0)
  • Load your settings: source SETUP.local
  • Check your settings: eb --show-config
  • Imitate NVIDIA BERT installation by running

eb --robot NLPL-nvidia_BERT_TF-20.06.08-gomkl-2019b-Python3.7.4.eb --dry-run

  • EasyBuild will show the list of required modules, marking those which have to be installed from scratch (by downloading and building the corresponding software).
  • If no warning or errors were shown, build everything required by NVIDIA BERT:

eb --robot NLPL-nvidia_BERT_TF-20.06.08-gomkl-2019b-Python3.7.4.eb

  • Once you have created your modules, they will be visible along with the system-provided ones via module avail, and can be loaded with module load.

Data Preparation

Training Example