Difference between revisions of "Eosc/norbert/benchmark"
(→NLP tasks) |
|||
(29 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
= Emerging Thoughts on Benchmarking = | = Emerging Thoughts on Benchmarking = | ||
− | + | The following would be natural places to start. For most of these, while we do have baseline numbers to compare to, we do not have existing set-ups where we could simply plug in a Norwegian BERT and rund, so we may need to identify suitable code for existing BERT-based architectures for e.g. English to re-use. For the first task though (document-level SA on NoReC) Jeremy would have an existing set-up for using mBERT that we could perhaps use. | |
− | + | == NLP tasks == | |
− | https://github.com/ltgoslo/ | + | * Structured sentiment analysis: [https://github.com/ltgoslo/norec_fine NoReC_fine] |
+ | * Sentence-level 2/3-way polarity: [https://github.com/ltgoslo/norec_sentence/ NoReC_sentences] | ||
+ | * Negation cues and scopes (evaluation is still being developed): [https://github.com/ltgoslo/norec_neg/ NoReC_neg] | ||
+ | * PoS tagging: [https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA ILA] + NDT [https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal Bokmaal] / [https://github.com/UniversalDependencies/UD_Norwegian-Nynorsk Nynorsk] | ||
+ | * Dependency parsing: [https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA ILA] + NDT [https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal Bokmaal] / [https://github.com/UniversalDependencies/UD_Norwegian-Nynorsk Nynorsk] | ||
+ | * NER: [https://github.com/ltgoslo/norne NorNE] (Bokmål+Nynorsk) | ||
+ | * Co-reference resolution (annotation ongoing) | ||
− | + | == Lexical == | |
− | https://github.com/ltgoslo/ | + | *[https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-27/ Word sense disambiguation in context] |
+ | *[https://github.com/ltgoslo/norwegian-synonyms Norwegian synonyms] (for static models) | ||
+ | *[https://github.com/ltgoslo/norwegian-analogies Norwegian analogies] (for static models) | ||
+ | *[https://github.com/ltgoslo/norsentlex NorSentLex]: Sentiment lexicon (for static models) | ||
− | + | == Text classification == | |
− | https:// | + | *[https://github.com/ltgoslo/norec NoReC]; document-level ratings. |
+ | *[https://github.com/ltgoslo/talk-of-norway Talk of Norway] | ||
+ | *[https://github.com/jerbarnes/norwegian_dialect NorDial] | ||
− | + | ==Other == | |
− |
Latest revision as of 11:56, 23 June 2021
Contents
Emerging Thoughts on Benchmarking
The following would be natural places to start. For most of these, while we do have baseline numbers to compare to, we do not have existing set-ups where we could simply plug in a Norwegian BERT and rund, so we may need to identify suitable code for existing BERT-based architectures for e.g. English to re-use. For the first task though (document-level SA on NoReC) Jeremy would have an existing set-up for using mBERT that we could perhaps use.
NLP tasks
- Structured sentiment analysis: NoReC_fine
- Sentence-level 2/3-way polarity: NoReC_sentences
- Negation cues and scopes (evaluation is still being developed): NoReC_neg
- PoS tagging: ILA + NDT Bokmaal / Nynorsk
- Dependency parsing: ILA + NDT Bokmaal / Nynorsk
- NER: NorNE (Bokmål+Nynorsk)
- Co-reference resolution (annotation ongoing)
Lexical
- Word sense disambiguation in context
- Norwegian synonyms (for static models)
- Norwegian analogies (for static models)
- NorSentLex: Sentiment lexicon (for static models)
Text classification
- NoReC; document-level ratings.
- Talk of Norway
- NorDial