Difference between revisions of "Eosc/norbert/benchmark"
Line 15: | Line 15: | ||
*[https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA Spoken dialects] | *[https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA Spoken dialects] | ||
− | == Lexical | + | == Lexical == |
*[https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-27/ Word sense disambiguation in context] | *[https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-27/ Word sense disambiguation in context] | ||
+ | *[https://github.com/ltgoslo/norwegian-synonyms Norwegian synonyms] | ||
+ | *[https://github.com/ltgoslo/norwegian-analogies Norwegian analogies] | ||
+ | *[https://github.com/ltgoslo/norsentlex NorSentLex]: Sentiment lexicon | ||
== Text classification == | == Text classification == |
Revision as of 11:41, 23 June 2021
Contents
Emerging Thoughts on Benchmarking
The following would be natural places to start. For most of these, while we do have baseline numbers to compare to, we do not have existing set-ups where we could simply plug in a Norwegian BERT and rund, so we may need to identify suitable code for existing BERT-based architectures for e.g. English to re-use. For the first task though (document-level SA on NoReC) Jeremy would have an existing set-up for using mBERT that we could perhaps use.
NLP tasks
NoReC*
- NoReC_fine: structured sentiment analysis
- NoReC_sentences sentence-level 2/3-way polarity
- NoReC_neg: negation cues and scopes
Linguistic pipeline (dependency parsing or PoS tagging)
Lexical
- Word sense disambiguation in context
- Norwegian synonyms
- Norwegian analogies
- NorSentLex: Sentiment lexicon
Text classification
- NoReC; document-level ratings.
- Talk of Norway
- NorDial