Difference between revisions of "Eosc/norbert"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Created page with "= Working Notes for Norwegian BERT-Like Models = = Available Text Corpora = = Preprocessing and Tokenization =")
 
(Available Text Corpora)
Line 3: Line 3:
  
 
= Available Text Corpora =
 
= Available Text Corpora =
 +
 +
*[https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-4/ Norsk Aviskorpus]
 +
*[https://dumps.wikimedia.org/nowiki/latest/ Norwegian Wikipedia]
 +
*[https://www.hf.uio.no/iln/english/about/organization/text-laboratory/projects/nowac/index.html noWAC]
 +
*[https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1989# CommonCrawl from CoNLL 2017]
 +
*[https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-34/ NB Digital] ???
  
 
= Preprocessing and Tokenization =
 
= Preprocessing and Tokenization =

Revision as of 12:23, 11 September 2020

Working Notes for Norwegian BERT-Like Models

Available Text Corpora

Preprocessing and Tokenization