Eosc/norbert

From Nordic Language Processing Laboratory
Revision as of 18:05, 16 September 2020 by Andreku (talk | contribs) (Preprocessing and Tokenization)
Jump to: navigation, search

Working Notes for Norwegian BERT-Like Models

Available Text Corpora

Preprocessing and Tokenization

SentencePiece library finds 157 unique characters in Norwegian Wikipedia dump.