Difference between revisions of "Vectors/elmo/tutorial"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Training ELMo on Saga)
(Training ELMo on Saga)
 
(7 intermediate revisions by the same user not shown)
Line 12: Line 12:
 
= Training ELMo on Saga =
 
= Training ELMo on Saga =
  
There are currently two options of training ELMo on Saga with GPU-enabled ''TensorFlow'': using a system ''TensorFlow'' module, or using ''Anaconda''.
+
We recommend to use the code from https://github.com/ltgoslo/simple_elmo_training to train an ELMo model with ''TensorFlow'' .  
 
 
In both cases, one should use the code from https://github.com/ltgoslo/simple_elmo_training to train a model.  
 
  
 
After cloning the repository and installing the dependencies, it boils down to running the command
 
After cloning the repository and installing the dependencies, it boils down to running the command
Line 30: Line 28:
 
$OUT is a directory where the TensorFlow checkpoints will be saved.
 
$OUT is a directory where the TensorFlow checkpoints will be saved.
  
 +
 +
There are currently three options of training ELMo on Saga with GPU-enabled ''TensorFlow'':
 +
*using the ''NLPL'' environment (recommended)
 +
*using a system ''TensorFlow'' module
 +
*using ''Anaconda''.
 +
The speed is comparable:  one epoch over 100 million word tokens takes about 3 hours with 2 NVIDIA P100 GPUs and batch size 192.
 +
 +
== Using the ''NLPL'' environment ==
 +
You will need to load the ''NLPL''-provided modules '''nlpl-python-candy/201912/3.7''' and '''nlpl-tensorflow/1.15.2/3.7'''
 +
 +
Example SLURM file:
 +
 +
  #!/bin/bash
 +
  #SBATCH --job-name=ELMo
 +
  #SBATCH --mail-type=FAIL
 +
  #SBATCH --account=nn9447k    # Use your project number
 +
  #SBATCH --partition=accel    # To use the accelerator nodes
 +
  #SBATCH --gres=gpu:2        # To specify how many GPUs to use
 +
  #SBATCH --time=10:00:00      # Max walltime is 14 days.
 +
  #SBATCH --mem-per-cpu=6G
 +
  #SBATCH --ntasks=8
 +
  set -o errexit  # Recommended for easier debugging
 +
  ## Load your modules
 +
  module purge  # Recommended for reproducibility
 +
  module use -a /cluster/shared/nlpl/software/modules/etc
 +
  module load nlpl-python-candy/201912/3.7 nlpl-tensorflow/1.15.2/3.7
 +
  python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE  --vocab_file $VOCAB --save_dir $OUT
  
 
== Using system TensorFlow ==
 
== Using system TensorFlow ==
If using system ''TensorFlow'', you do not have to install any ''Python'' modules yourself.
+
If using system ''TensorFlow'' (TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6), you do not have to install it locally.
  
 
Example SLURM file:
 
Example SLURM file:
Line 52: Line 77:
  
 
== Using Anaconda ==
 
== Using Anaconda ==
If using ''Anaconda'', you should install the tensorflow-gpu ''Python'' package locally.
+
If using ''Anaconda'', you should install the tensorflow-gpu ''Python'' package locally. The profit is that you can choose (to some extent) the version of ''TensorFlow''.
  
 
Example SLURM file:
 
Example SLURM file:

Latest revision as of 00:12, 3 February 2020

Background

ELMo is a family of contextualized word embeddings first introduced in [Peter et al. 2018].

Using pre-trained models

Pre-trained ELMo models are available from the NLPL Word Embeddings repository.

Python code to infer contextualized word vectors from any input text, given a pre-trained model:

https://github.com/ltgoslo/simple_elmo

Training ELMo on Saga

We recommend to use the code from https://github.com/ltgoslo/simple_elmo_training to train an ELMo model with TensorFlow .

After cloning the repository and installing the dependencies, it boils down to running the command

python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE --vocab_file $VOCAB --save_dir $OUT

where

$DATA is a path to the directory containing any number of (possibly gzipped) plain text files: your training corpus.

$SIZE if the number of word tokens in $DATA (necessary to properly construct and log batches).

$VOCAB is a (possibly gzipped) one-word-per-line vocabulary file to be used for language modeling; it should always contain at least <S>, </S> and <UNK>.

$OUT is a directory where the TensorFlow checkpoints will be saved.


There are currently three options of training ELMo on Saga with GPU-enabled TensorFlow:

  • using the NLPL environment (recommended)
  • using a system TensorFlow module
  • using Anaconda.

The speed is comparable: one epoch over 100 million word tokens takes about 3 hours with 2 NVIDIA P100 GPUs and batch size 192.

Using the NLPL environment

You will need to load the NLPL-provided modules nlpl-python-candy/201912/3.7 and nlpl-tensorflow/1.15.2/3.7

Example SLURM file:

 #!/bin/bash
 #SBATCH --job-name=ELMo
 #SBATCH --mail-type=FAIL
 #SBATCH --account=nn9447k    # Use your project number
 #SBATCH --partition=accel    # To use the accelerator nodes
 #SBATCH --gres=gpu:2         # To specify how many GPUs to use
 #SBATCH --time=10:00:00      # Max walltime is 14 days.
 #SBATCH --mem-per-cpu=6G
 #SBATCH --ntasks=8
 set -o errexit  # Recommended for easier debugging
 ## Load your modules
 module purge   # Recommended for reproducibility
 module use -a /cluster/shared/nlpl/software/modules/etc
 module load nlpl-python-candy/201912/3.7 nlpl-tensorflow/1.15.2/3.7
 python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE  --vocab_file $VOCAB --save_dir $OUT

Using system TensorFlow

If using system TensorFlow (TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6), you do not have to install it locally.

Example SLURM file:

#!/bin/bash
#SBATCH --job-name=elmo
#SBATCH --mail-type=FAIL
#SBATCH --account=nn9447k    # Use your project number
#SBATCH --partition=accel    # To use the accelerator nodes
#SBATCH --gres=gpu:2         # To specify how many GPUs to use
#SBATCH --time=10:00:00      # Max walltime is 14 days.
#SBATCH --mem-per-cpu=6G
#SBATCH --ntasks=8
set -o errexit  # Recommended for easier debugging
## Load your modules
module purge   # Recommended for reproducibility
module load TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6
python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE  --vocab_file $VOCAB --save_dir $OUT

Using Anaconda

If using Anaconda, you should install the tensorflow-gpu Python package locally. The profit is that you can choose (to some extent) the version of TensorFlow.

Example SLURM file:

#!/bin/bash
#SBATCH --job-name=elmo
#SBATCH --mail-type=FAIL
#SBATCH --account=nn9447k  # Use your project number
#SBATCH --partition=accel    # To use the accelerator nodes
#SBATCH --gres=gpu:2         # To specify how many GPUs to use
#SBATCH --time=10:00:00      # Max walltime is 14 days.
#SBATCH --mem-per-cpu=6G
#SBATCH --ntasks=8
set -o errexit  # Recommended for easier debugging
module purge   # Recommended for reproducibility
module load Anaconda3/2019.03
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/cluster/software/Anaconda3/2019.03/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
   eval "$__conda_setup"
else
   if [ -f "/cluster/software/Anaconda3/2019.03/etc/profile.d/conda.sh" ]; then
               . "/cluster/software/Anaconda3/2019.03/etc/profile.d/conda.sh"
   else
       export PATH="/cluster/software/Anaconda3/2019.03/bin:$PATH"
   fi
fi
unset __conda_setup
# <<< conda initialize <<<
conda activate python3.6
python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE --vocab_file $VOCAB --save_dir $OUT