Difference between revisions of "Eosc/pretraining"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Available implementations)
Line 9: Line 9:
 
of these implementations, in an automated and uniform manner,
 
of these implementations, in an automated and uniform manner,
 
on multiple HPC systems.
 
on multiple HPC systems.
 +
 +
= BERT =
 +
Bidirectional Encoder Representations from Transformers (BERT) is a deep language model jointly conditioned on both left and right context in all layers.
 +
It is based on the Transformer neural architecture ([https://www.aclweb.org/anthology/N19-1423/ Devlin et al 2019]).
 +
 +
The de-facto standard for contextualized representations in modern NLP.
 +
 +
== Available implementations ==
 +
- [https://github.com/google-research/bert Reference Google implementation in TensorFlow].
 +
Requirements: 1.11 <= TensorFlow < 2.0.
 +
 +
- [https://huggingface.co/transformers/model_doc/bert.html HuggingFace Transformers implementation].
 +
Can train either with TensorFlow or with PyTorch.
 +
Requirements: Python >=3.6, TensorFlow >= 2.0, PyTorch >=1.3.1.
 +
 +
- [https://github.com/soskek/bert-chainer Chainer implementation].
 +
Not much interesting to us, since it does not support training, only inference.
 +
  
 
= ELMo =
 
= ELMo =
Embeddings from Language Models (ELMo) use bidirectional LSTM language models to produce contextualized word token representations ([https://www.aclweb.org/anthology/N18-1202/ Peters et al 2018])
+
Embeddings from Language Models (ELMo) use bidirectional LSTM language models to produce contextualized word token representations ([https://www.aclweb.org/anthology/N18-1202/ Peters et al 2018]).
 +
 
 +
The only architecture in the list to use recurrent neural networks, not Transformers.
 +
Despite being much less computationally demanding, often performs on par with BERT.
  
 
== Available implementations ==
 
== Available implementations ==
Line 17: Line 38:
 
Requirements: Python >=3.5, 1.2 < TensorFlow < 1.13 (later versions produce too many deprecation warnings), h5py.
 
Requirements: Python >=3.5, 1.2 < TensorFlow < 1.13 (later versions produce too many deprecation warnings), h5py.
  
- [https://github.com/ltgoslo/simple_elmo_training LTG implementation]. Based on the reference implementation, but with improved data loading, hyperparameter handling, and the code updated to more recent versions of TensorFlow.  
+
- [https://github.com/ltgoslo/simple_elmo_training LTG implementation].  
 +
Based on the reference implementation, but with improved data loading, hyperparameter handling, and the code updated to more recent versions of TensorFlow.  
 
Requirements: Python >=3.5, 1.15 <= Tensorflow < 2.0 (2.0 version is planned), h5py, smart_open.
 
Requirements: Python >=3.5, 1.15 <= Tensorflow < 2.0 (2.0 version is planned), h5py, smart_open.
 
[http://wiki.nlpl.eu/index.php/Vectors/elmo/tutorial Tutorial] is available.
 
[http://wiki.nlpl.eu/index.php/Vectors/elmo/tutorial Tutorial] is available.
  
- [https://docs.allennlp.org/master/api/data/token_indexers/elmo_indexer/ PyTorch implementation in AllenNLP]. Not much interesting to us, since it does not support training, only inference.  
+
- [https://docs.allennlp.org/master/api/data/token_indexers/elmo_indexer/ PyTorch implementation in AllenNLP].  
 +
Not much interesting to us, since it does not support training, only inference.  
 
Requirements: Python >= 3.6, 1.6 <= PyTorch < 1.7.
 
Requirements: Python >= 3.6, 1.6 <= PyTorch < 1.7.
 
= BERT =
 
Bidirectional Encoder Representations from Transformers (BERT) is a deep language model jointly conditioned on both left and right context in all layers. It is based on the Transformer neural architecture ([https://www.aclweb.org/anthology/N19-1423/ Devlin et al 2019]).
 
 
== Available implementations ==
 
- [https://github.com/google-research/bert Reference Google implementation in TensorFlow].
 
Requirements: 1.11 <= TensorFlow < 2.0.
 
 
- [https://huggingface.co/transformers/model_doc/bert.html HuggingFace Transformers implementation].
 
Can train either with TensorFlow or with PyTorch. Requirements: Python >=3.6, TensorFlow >= 2.0, PyTorch >=1.3.1.
 
 
- [https://github.com/soskek/bert-chainer Chainer implementation]. Does not support training.
 
  
 
= RoBERTa =
 
= RoBERTa =
  
 
= ELECTRA =
 
= ELECTRA =

Revision as of 17:50, 1 September 2020

Background

This page provides an informal, technically-oriented survey over available (and commonly used) architectures and implementations for large-scale pre-training (and fine-tuning) of contextualized neural language models.

The NLPL use case, will install, validate, and maintain a selection of these implementations, in an automated and uniform manner, on multiple HPC systems.

BERT

Bidirectional Encoder Representations from Transformers (BERT) is a deep language model jointly conditioned on both left and right context in all layers. It is based on the Transformer neural architecture (Devlin et al 2019).

The de-facto standard for contextualized representations in modern NLP.

Available implementations

- Reference Google implementation in TensorFlow. Requirements: 1.11 <= TensorFlow < 2.0.

- HuggingFace Transformers implementation. Can train either with TensorFlow or with PyTorch. Requirements: Python >=3.6, TensorFlow >= 2.0, PyTorch >=1.3.1.

- Chainer implementation. Not much interesting to us, since it does not support training, only inference.


ELMo

Embeddings from Language Models (ELMo) use bidirectional LSTM language models to produce contextualized word token representations (Peters et al 2018).

The only architecture in the list to use recurrent neural networks, not Transformers. Despite being much less computationally demanding, often performs on par with BERT.

Available implementations

- Reference Tensorflow implementation. Requirements: Python >=3.5, 1.2 < TensorFlow < 1.13 (later versions produce too many deprecation warnings), h5py.

- LTG implementation. Based on the reference implementation, but with improved data loading, hyperparameter handling, and the code updated to more recent versions of TensorFlow. Requirements: Python >=3.5, 1.15 <= Tensorflow < 2.0 (2.0 version is planned), h5py, smart_open. Tutorial is available.

- PyTorch implementation in AllenNLP. Not much interesting to us, since it does not support training, only inference. Requirements: Python >= 3.6, 1.6 <= PyTorch < 1.7.

RoBERTa

ELECTRA