Difference between revisions of "Vectors/elmo/tutorial"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Using pre-trained models)
(Training ELMo on Saga)
 
(13 intermediate revisions by the same user not shown)
Line 12: Line 12:
 
= Training ELMo on Saga =
 
= Training ELMo on Saga =
  
As of now, one should use ''Anaconda'' to get working GPU-enabled ''TensorFlow'' on Saga.
+
We recommend to use the code from https://github.com/ltgoslo/simple_elmo_training to train an ELMo model with ''TensorFlow'' .  
tensorflow-gpu ''Python'' package is then installed locally.
 
  
After that, the code from https://github.com/akutuzov/bilm-tf can be used to train a model. More instructions to appear later.
+
After cloning the repository and installing the dependencies, it boils down to running the command
 +
 
 +
python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE --vocab_file $VOCAB --save_dir $OUT
 +
 
 +
where
 +
 
 +
$DATA is a path to the directory containing any number of (possibly gzipped) plain text files: your training corpus.
 +
 
 +
$SIZE if the number of word tokens in $DATA (necessary to properly construct and log batches).
 +
 
 +
$VOCAB is a (possibly gzipped) one-word-per-line vocabulary file to be used for language modeling; it should always contain at least <nowiki><S></nowiki>, <nowiki></S></nowiki> and <UNK>.
 +
 
 +
$OUT is a directory where the TensorFlow checkpoints will be saved.
 +
 
 +
 
 +
There are currently three options of training ELMo on Saga with GPU-enabled ''TensorFlow'':  
 +
*using the ''NLPL'' environment (recommended)
 +
*using a system ''TensorFlow'' module
 +
*using ''Anaconda''.
 +
The speed is comparable:  one epoch over 100 million word tokens takes about 3 hours with 2 NVIDIA P100 GPUs and batch size 192.
 +
 
 +
== Using the ''NLPL'' environment ==
 +
You will need to load the ''NLPL''-provided modules '''nlpl-python-candy/201912/3.7''' and '''nlpl-tensorflow/1.15.2/3.7'''
 +
 
 +
Example SLURM file:
 +
 
 +
  #!/bin/bash
 +
  #SBATCH --job-name=ELMo
 +
  #SBATCH --mail-type=FAIL
 +
  #SBATCH --account=nn9447k    # Use your project number
 +
  #SBATCH --partition=accel    # To use the accelerator nodes
 +
  #SBATCH --gres=gpu:2        # To specify how many GPUs to use
 +
  #SBATCH --time=10:00:00      # Max walltime is 14 days.
 +
  #SBATCH --mem-per-cpu=6G
 +
  #SBATCH --ntasks=8
 +
  set -o errexit  # Recommended for easier debugging
 +
  ## Load your modules
 +
  module purge  # Recommended for reproducibility
 +
  module use -a /cluster/shared/nlpl/software/modules/etc
 +
  module load nlpl-python-candy/201912/3.7 nlpl-tensorflow/1.15.2/3.7
 +
  python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE  --vocab_file $VOCAB --save_dir $OUT
 +
 
 +
== Using system TensorFlow ==
 +
If using system ''TensorFlow'' (TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6), you do not have to install it locally.
 +
 
 +
Example SLURM file:
 +
 
 +
#!/bin/bash
 +
#SBATCH --job-name=elmo
 +
#SBATCH --mail-type=FAIL
 +
#SBATCH --account=nn9447k    # Use your project number
 +
#SBATCH --partition=accel    # To use the accelerator nodes
 +
#SBATCH --gres=gpu:2        # To specify how many GPUs to use
 +
#SBATCH --time=10:00:00      # Max walltime is 14 days.
 +
#SBATCH --mem-per-cpu=6G
 +
#SBATCH --ntasks=8
 +
set -o errexit  # Recommended for easier debugging
 +
## Load your modules
 +
module purge  # Recommended for reproducibility
 +
module load TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6
 +
python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE  --vocab_file $VOCAB --save_dir $OUT
 +
 
 +
== Using Anaconda ==
 +
If using ''Anaconda'', you should install the tensorflow-gpu ''Python'' package locally. The profit is that you can choose (to some extent) the version of ''TensorFlow''.
  
 
Example SLURM file:
 
Example SLURM file:
Line 47: Line 109:
 
  conda activate python3.6
 
  conda activate python3.6
 
  python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE --vocab_file $VOCAB --save_dir $OUT
 
  python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE --vocab_file $VOCAB --save_dir $OUT
 
$DATA is a path to the directory containing any number of (possibly gzipped) plain text files: your training corpus.
 
$SIZE if the number of word tokens in $DATA (necessary to properly construct and log batches).
 
$VOCAB is a (possibly gzipped) one-word-per-line vocabulary file; it should always contain at least <nowiki><S></nowiki>, <nowiki></S></nowiki> and <UNK>.
 
$OUT is a directory where the TensorFlow checkpoints will be saved.
 

Latest revision as of 00:12, 3 February 2020

Background

ELMo is a family of contextualized word embeddings first introduced in [Peter et al. 2018].

Using pre-trained models

Pre-trained ELMo models are available from the NLPL Word Embeddings repository.

Python code to infer contextualized word vectors from any input text, given a pre-trained model:

https://github.com/ltgoslo/simple_elmo

Training ELMo on Saga

We recommend to use the code from https://github.com/ltgoslo/simple_elmo_training to train an ELMo model with TensorFlow .

After cloning the repository and installing the dependencies, it boils down to running the command

python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE --vocab_file $VOCAB --save_dir $OUT

where

$DATA is a path to the directory containing any number of (possibly gzipped) plain text files: your training corpus.

$SIZE if the number of word tokens in $DATA (necessary to properly construct and log batches).

$VOCAB is a (possibly gzipped) one-word-per-line vocabulary file to be used for language modeling; it should always contain at least <S>, </S> and <UNK>.

$OUT is a directory where the TensorFlow checkpoints will be saved.


There are currently three options of training ELMo on Saga with GPU-enabled TensorFlow:

  • using the NLPL environment (recommended)
  • using a system TensorFlow module
  • using Anaconda.

The speed is comparable: one epoch over 100 million word tokens takes about 3 hours with 2 NVIDIA P100 GPUs and batch size 192.

Using the NLPL environment

You will need to load the NLPL-provided modules nlpl-python-candy/201912/3.7 and nlpl-tensorflow/1.15.2/3.7

Example SLURM file:

 #!/bin/bash
 #SBATCH --job-name=ELMo
 #SBATCH --mail-type=FAIL
 #SBATCH --account=nn9447k    # Use your project number
 #SBATCH --partition=accel    # To use the accelerator nodes
 #SBATCH --gres=gpu:2         # To specify how many GPUs to use
 #SBATCH --time=10:00:00      # Max walltime is 14 days.
 #SBATCH --mem-per-cpu=6G
 #SBATCH --ntasks=8
 set -o errexit  # Recommended for easier debugging
 ## Load your modules
 module purge   # Recommended for reproducibility
 module use -a /cluster/shared/nlpl/software/modules/etc
 module load nlpl-python-candy/201912/3.7 nlpl-tensorflow/1.15.2/3.7
 python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE  --vocab_file $VOCAB --save_dir $OUT

Using system TensorFlow

If using system TensorFlow (TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6), you do not have to install it locally.

Example SLURM file:

#!/bin/bash
#SBATCH --job-name=elmo
#SBATCH --mail-type=FAIL
#SBATCH --account=nn9447k    # Use your project number
#SBATCH --partition=accel    # To use the accelerator nodes
#SBATCH --gres=gpu:2         # To specify how many GPUs to use
#SBATCH --time=10:00:00      # Max walltime is 14 days.
#SBATCH --mem-per-cpu=6G
#SBATCH --ntasks=8
set -o errexit  # Recommended for easier debugging
## Load your modules
module purge   # Recommended for reproducibility
module load TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6
python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE  --vocab_file $VOCAB --save_dir $OUT

Using Anaconda

If using Anaconda, you should install the tensorflow-gpu Python package locally. The profit is that you can choose (to some extent) the version of TensorFlow.

Example SLURM file:

#!/bin/bash
#SBATCH --job-name=elmo
#SBATCH --mail-type=FAIL
#SBATCH --account=nn9447k  # Use your project number
#SBATCH --partition=accel    # To use the accelerator nodes
#SBATCH --gres=gpu:2         # To specify how many GPUs to use
#SBATCH --time=10:00:00      # Max walltime is 14 days.
#SBATCH --mem-per-cpu=6G
#SBATCH --ntasks=8
set -o errexit  # Recommended for easier debugging
module purge   # Recommended for reproducibility
module load Anaconda3/2019.03
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/cluster/software/Anaconda3/2019.03/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
   eval "$__conda_setup"
else
   if [ -f "/cluster/software/Anaconda3/2019.03/etc/profile.d/conda.sh" ]; then
               . "/cluster/software/Anaconda3/2019.03/etc/profile.d/conda.sh"
   else
       export PATH="/cluster/software/Anaconda3/2019.03/bin:$PATH"
   fi
fi
unset __conda_setup
# <<< conda initialize <<<
conda activate python3.6
python3 bin/train_elmo.py --train_prefix $DATA --size $SIZE --vocab_file $VOCAB --save_dir $OUT