Difference between revisions of "Eosc/norbert"
(Created page with "= Working Notes for Norwegian BERT-Like Models = = Available Text Corpora = = Preprocessing and Tokenization =") |
(→Available Text Corpora) |
||
Line 3: | Line 3: | ||
= Available Text Corpora = | = Available Text Corpora = | ||
+ | |||
+ | *[https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-4/ Norsk Aviskorpus] | ||
+ | *[https://dumps.wikimedia.org/nowiki/latest/ Norwegian Wikipedia] | ||
+ | *[https://www.hf.uio.no/iln/english/about/organization/text-laboratory/projects/nowac/index.html noWAC] | ||
+ | *[https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1989# CommonCrawl from CoNLL 2017] | ||
+ | *[https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-34/ NB Digital] ??? | ||
= Preprocessing and Tokenization = | = Preprocessing and Tokenization = |