Difference between revisions of "Eosc/norbert/benchmark"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Emerging Thoughts on Benchmarking)
(NLP tasks)
 
(22 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
The following would be natural places to start. For most of these, while we do have baseline numbers to compare to, we do not have existing set-ups where we could simply plug in a Norwegian BERT and rund, so we may need to identify suitable code for existing BERT-based architectures for e.g. English to re-use. For the first task though (document-level SA on NoReC) Jeremy would have an existing set-up for using mBERT that we could perhaps use.  
 
The following would be natural places to start. For most of these, while we do have baseline numbers to compare to, we do not have existing set-ups where we could simply plug in a Norwegian BERT and rund, so we may need to identify suitable code for existing BERT-based architectures for e.g. English to re-use. For the first task though (document-level SA on NoReC) Jeremy would have an existing set-up for using mBERT that we could perhaps use.  
  
*[https://github.com/ltgoslo/norec_fine NoReC]; for document-level sentiment analysis (i.e. rating prediction). Note that we would want to use another version than the current official release; this has 10k more sentences (and is soon to be officially released).
+
== NLP tasks ==
*[https://github.com/ltgoslo/norec_fine NoReC_fine]; subset of documents from NoReC annotated with fine-grained sentiment (e.g. for predicting target expression + polarity)
+
* Structured sentiment analysis: [https://github.com/ltgoslo/norec_fine NoReC_fine]
*[https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-10/ NDT]; for dependency parsing or PoS tagging (perhaps best to use the UD version)
+
* Sentence-level 2/3-way polarity: [https://github.com/ltgoslo/norec_sentence/ NoReC_sentences] 
*[https://github.com/ltgoslo/norne NorNE]; for named entity recognition, extends NDT (also available for the UD version)
+
* Negation cues and scopes (evaluation is still being developed): [https://github.com/ltgoslo/norec_neg/ NoReC_neg]
*NoReC_neg; soon to be released; adds negation cues and scopes to the same subset of sentences as in NoReC_fine.
+
* PoS tagging: [https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA ILA] +  NDT [https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal Bokmaal] / [https://github.com/UniversalDependencies/UD_Norwegian-Nynorsk Nynorsk]
 +
* Dependency parsing: [https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA ILA] + NDT [https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal Bokmaal] / [https://github.com/UniversalDependencies/UD_Norwegian-Nynorsk Nynorsk]
 +
* NER: [https://github.com/ltgoslo/norne NorNE] (Bokmål+Nynorsk)
 +
* Co-reference resolution (annotation ongoing)
 +
 
 +
== Lexical  ==
 +
*[https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-27/ Word sense disambiguation in context]
 +
*[https://github.com/ltgoslo/norwegian-synonyms Norwegian synonyms] (for static models)
 +
*[https://github.com/ltgoslo/norwegian-analogies Norwegian analogies] (for static models)
 +
*[https://github.com/ltgoslo/norsentlex NorSentLex]: Sentiment lexicon (for static models)
 +
 
 +
== Text classification ==
 +
*[https://github.com/ltgoslo/norec NoReC]; document-level ratings.
 
*[https://github.com/ltgoslo/talk-of-norway Talk of Norway]
 
*[https://github.com/ltgoslo/talk-of-norway Talk of Norway]
 +
*[https://github.com/jerbarnes/norwegian_dialect NorDial]
 +
 +
==Other ==

Latest revision as of 11:56, 23 June 2021

Emerging Thoughts on Benchmarking

The following would be natural places to start. For most of these, while we do have baseline numbers to compare to, we do not have existing set-ups where we could simply plug in a Norwegian BERT and rund, so we may need to identify suitable code for existing BERT-based architectures for e.g. English to re-use. For the first task though (document-level SA on NoReC) Jeremy would have an existing set-up for using mBERT that we could perhaps use.

NLP tasks

Lexical

Text classification

Other