Difference between revisions of "Eosc/norbert"
 (Created page with "= Working Notes for Norwegian BERT-Like Models =   = Available Text Corpora =  = Preprocessing and Tokenization =")  | 
				 (→Available Text Corpora)  | 
				||
| Line 3: | Line 3: | ||
= Available Text Corpora =  | = Available Text Corpora =  | ||
| + | |||
| + | *[https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-4/ Norsk Aviskorpus]  | ||
| + | *[https://dumps.wikimedia.org/nowiki/latest/ Norwegian Wikipedia]  | ||
| + | *[https://www.hf.uio.no/iln/english/about/organization/text-laboratory/projects/nowac/index.html noWAC]  | ||
| + | *[https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1989# CommonCrawl from CoNLL 2017]  | ||
| + | *[https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-34/ NB Digital] ???  | ||
= Preprocessing and Tokenization =  | = Preprocessing and Tokenization =  | ||