Eosc/norbert
Contents
Working Notes for Norwegian BERT-Like Models
Available Text Corpora
Preprocessing and Tokenization
SentencePiece library finds 157 unique characters in Norwegian Wikipedia dump.
Evaluation
Do we have available Norwegian test sets for typical NLP tasks to evaluate our NorBERT?