Latest revision as of 11:56, 23 June 2021

Emerging Thoughts on Benchmarking

The following would be natural places to start. For most of these, while we do have baseline numbers to compare to, we do not have existing set-ups where we could simply plug in a Norwegian BERT and rund, so we may need to identify suitable code for existing BERT-based architectures for e.g. English to re-use. For the first task though (document-level SA on NoReC) Jeremy would have an existing set-up for using mBERT that we could perhaps use.

NLP tasks

Structured sentiment analysis: NoReC_fine
Sentence-level 2/3-way polarity: NoReC_sentences
Negation cues and scopes (evaluation is still being developed): NoReC_neg
PoS tagging: ILA + NDT Bokmaal / Nynorsk
Dependency parsing: ILA + NDT Bokmaal / Nynorsk
NER: NorNE (Bokmål+Nynorsk)
Co-reference resolution (annotation ongoing)

Lexical

Word sense disambiguation in context
Norwegian synonyms (for static models)
Norwegian analogies (for static models)
NorSentLex: Sentiment lexicon (for static models)

Text classification

NoReC; document-level ratings.
Talk of Norway
NorDial

@@ Line 1: / Line 1: @@
 = Emerging Thoughts on Benchmarking =
-This would be natural places to start:
+The following would be natural places to start. For most of these, while we do have baseline numbers to compare to, we do not have existing set-ups where we could simply plug in a Norwegian BERT and rund, so we may need to identify suitable code for existing BERT-based architectures for e.g. English to re-use. For the first task though (document-level SA on NoReC) Jeremy would have an existing set-up for using mBERT that we could perhaps use.
-NoReC, for document-level sentiment analysis (i.e. rating prediction)
+== NLP tasks ==
-https://github.com/ltgoslo/norec
+* Structured sentiment analysis: [https://github.com/ltgoslo/norec_fine NoReC_fine]
+* Sentence-level 2/3-way polarity: [https://github.com/ltgoslo/norec_sentence/ NoReC_sentences]
+* Negation cues and scopes (evaluation is still being developed): [https://github.com/ltgoslo/norec_neg/ NoReC_neg]
+* PoS tagging: [https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA ILA] +  NDT [https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal Bokmaal] / [https://github.com/UniversalDependencies/UD_Norwegian-Nynorsk Nynorsk]
+* Dependency parsing: [https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA ILA] + NDT [https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal Bokmaal] / [https://github.com/UniversalDependencies/UD_Norwegian-Nynorsk Nynorsk]
+* NER: [https://github.com/ltgoslo/norne NorNE] (Bokmål+Nynorsk)
+* Co-reference resolution (annotation ongoing)
-NoReC_fine, for fine-grained sentiment analysis (e.g. predicting target expression + polarity)
+== Lexical  ==
-https://github.com/ltgoslo/norec_fine
+*[https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-27/ Word sense disambiguation in context]
+*[https://github.com/ltgoslo/norwegian-synonyms Norwegian synonyms] (for static models)
+*[https://github.com/ltgoslo/norwegian-analogies Norwegian analogies] (for static models)
+*[https://github.com/ltgoslo/norsentlex NorSentLex]: Sentiment lexicon (for static models)
-NDT, for dependency parsing or PoS tagging (perhaps best to use the UD version)
+== Text classification ==
-https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-10/
+*[https://github.com/ltgoslo/norec NoReC]; document-level ratings.
+*[https://github.com/ltgoslo/talk-of-norway Talk of Norway]
+*[https://github.com/jerbarnes/norwegian_dialect NorDial]
-NorNE, for named entity recognition, extends NDT (also available for the UD version)
+==Other ==
-https://github.com/ltgoslo/norne

Difference between revisions of "Eosc/norbert/benchmark"

Latest revision as of 11:56, 23 June 2021

Contents

Emerging Thoughts on Benchmarking

NLP tasks

Lexical

Text classification

Other

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools