Difference between revisions of "Eosc/norbert"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Preprocessing and Tokenization)
Line 13: Line 13:
  
 
[https://github.com/google/sentencepiece SentencePiece] library finds '''157''' unique characters in Norwegian Wikipedia dump.
 
[https://github.com/google/sentencepiece SentencePiece] library finds '''157''' unique characters in Norwegian Wikipedia dump.
 +
 +
= Evaluation =
 +
Do we have available Norwegian test sets for typical NLP tasks to evaluate our NorBERT?

Revision as of 19:45, 16 September 2020

Working Notes for Norwegian BERT-Like Models

Available Text Corpora

Preprocessing and Tokenization

SentencePiece library finds 157 unique characters in Norwegian Wikipedia dump.

Evaluation

Do we have available Norwegian test sets for typical NLP tasks to evaluate our NorBERT?