Difference between revisions of "Eosc/norbert"
(Created page with "= Working Notes for Norwegian BERT-Like Models = = Available Text Corpora = = Preprocessing and Tokenization =") |
(→Available Text Corpora) |
||
| Line 3: | Line 3: | ||
= Available Text Corpora = | = Available Text Corpora = | ||
| + | |||
| + | *[https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-4/ Norsk Aviskorpus] | ||
| + | *[https://dumps.wikimedia.org/nowiki/latest/ Norwegian Wikipedia] | ||
| + | *[https://www.hf.uio.no/iln/english/about/organization/text-laboratory/projects/nowac/index.html noWAC] | ||
| + | *[https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1989# CommonCrawl from CoNLL 2017] | ||
| + | *[https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-34/ NB Digital] ??? | ||
= Preprocessing and Tokenization = | = Preprocessing and Tokenization = | ||