Difference between revisions of "Community/training"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Programme)
(Programme)
 
(88 intermediate revisions by 3 users not shown)
Line 1: Line 1:
'''HPLT & NLPL Winter School on Large-Scale Language Modeling and Neural Machine Translation with Web Data'''
+
'''HPLT & NLPL Winter School on Large Language Models: Creation, Customization, Evaluation, and Use'''
  
[[File:skeikampen.2020.png|center]]
+
[[File:Skeikampen.2023.jpg|center]]
  
 
= Background =
 
= Background =
  
After a two-year pandemic hiatus, the NLPL network and Horizon Europe
+
Since 2023, the NLPL network and Horizon Europe
project ''High-Performance Language Technologies'' (HPLT) join forces
+
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)
to re-launch the successful winter school series on large-scale NLP.
+
have joined forces to organize the successful winter school series on Web-scale NLP.
 
The winter school seeks to stimulate ''community formation'',
 
The winter school seeks to stimulate ''community formation'',
i.e. strengthening interaction and collaboration among Nordic and
+
i.e. strengthening interaction and collaboration among
 
European research teams in NLP and advancing a shared level of knowledge
 
European research teams in NLP and advancing a shared level of knowledge
 
and experience in using high-performance e-infrastructures for large-scale
 
and experience in using high-performance e-infrastructures for large-scale
 
NLP research.
 
NLP research.
The 2023 edition of the winter school puts special emphasis on
+
The 2024 edition of the winter school puts special emphasis on
 
NLP researchers from countries who participate in the EuroHPC
 
NLP researchers from countries who participate in the EuroHPC
 
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].
 
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].
 
For additional background, please see the archival pages from the
 
For additional background, please see the archival pages from the
 
[http://wiki.nlpl.eu/index.php/Community/training/2018 2018],
 
[http://wiki.nlpl.eu/index.php/Community/training/2018 2018],
[http://wiki.nlpl.eu/index.php/Community/training/2019 2019], and
+
[http://wiki.nlpl.eu/index.php/Community/training/2019 2019],
[http://wiki.nlpl.eu/index.php/Community/training/2020 2020]
+
[http://wiki.nlpl.eu/index.php/Community/training/2020 2020], and
 +
[http://wiki.nlpl.eu/index.php/Community/training/2023 2023]
 
NLPL Winter Schools.
 
NLPL Winter Schools.
  
For early 2023, HPLT will hold its winter school from Monday, February 6, to
+
For early 2024, HPLT will hold its winter school from Sunday, February 4, to
Wednesday, February 8, 2023, at a
+
Tuesday, February 6, 2024, at a
 
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]
 
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]
 
(with skiing and walking opportunities) about two hours north of Oslo.
 
(with skiing and walking opportunities) about two hours north of Oslo.
 
The project will organize group bus transfer from and to the Oslo
 
The project will organize group bus transfer from and to the Oslo
airport ''Gardermoen'', leaving the airport at 9:30 on Monday morning
+
airport ''Gardermoen'', leaving the airport at 9:45 on Sunday morning
and returning there around 17:30 on Wednesday afternoon.
+
and returning there around 17:30 on Tuesday afternoon.
  
 
The winter school is subsidized by the HPLT project: there is no fee for
 
The winter school is subsidized by the HPLT project: there is no fee for
Line 35: Line 36:
 
All participants will have to cover their own travel and accomodation
 
All participants will have to cover their own travel and accomodation
 
at Skeikampen, however.
 
at Skeikampen, however.
Two nights at the hotel, including all meals, will come to NOK 3190 (NOK 2790 per person in a shared double room),  
+
Two nights at the hotel, including all meals, will come to NOK 3745 (NOK 3345 per person in a shared double room),  
 
to be paid to the hotel directly.
 
to be paid to the hotel directly.
  
 
= Programme =
 
= Programme =
  
The 2023 winter school will have a thematic focus on ''Web Data for Large-Scale Language Modeling and Neural Machine Translation''.
+
The 2024 winter school will have a thematic focus on ''Large Language Models: Creation, Customization, Evaluation, and Use''.
 
The programme will be comprised of in-depth technical presentations (possibly including some
 
The programme will be comprised of in-depth technical presentations (possibly including some
hands-on elements) from, among others, the
+
hands-on elements) by seasoned experts, with special emphasis on open science and European languages,  
[https://bigscience.huggingface.co BigScience] and [https://commoncrawl.org Common Crawl] initiatives,
+
but also include critical reflections on current development trends in LLM-focussed NLP.
but also include critical reflections on working with massive, uncurated language data.
+
The programme will be complemented with a panel discussion and a ‘walk-through’ of available
The programme will be complemented with a panel discussion and a ‘walk-through’ of available infrastructure on the shared EuroHPC LUMI supercomputer.
+
infrastructure on the shared EuroHPC LUMI supercomputer.
  
 
Confirmed presenters include:
 
Confirmed presenters include:
  
* Mehdi Ali, Fraunhofer IAIS</br><b>OpenGPT-X: Development of a Gaia-X Node for Large AI Language Models and Innovative Language Application Service</b></br>The development of large language models is currently dominated by non-European organizations. The OpenGPT-X project aims to ensure the European data and AI sovereignty to develop this technology. The developed language models will be made open-source, facilitating the usage and research of these models. In this talk, we provide an overview of the project, describe the current status and give an outlook.
+
* [http://afra.alishahi.name Afra Alishahi, Tilburg University, The Netherlands]
* [https://faculty.washington.edu/ebender/ Emily M. Bender, University of Washington]</br><b>Towards Responsible Development and Application of Large Language Models</b></br>This session will begin with a problematization of the rush for scale in language models and the "foundation model" conceptualization and exploration of the risks of ever larger language models. I will then turn to some discussion of what can be done better, drawing on value sensitive design and with a focus on evaluation grounded in specific use cases and on thorough documentation. Finally, I will reflect on the dangers and responsibilities that come with working in an area with intense media and corporate interest.
+
* [https://di.ku.dk/english/staff/vip/?pure=en/persons/631668 Desmond Elliot, University of Copenhagen, Denmark]
* [https://huggingface.co/teven Teven Le Scao, Hugging Face]</br><b>Large Language Models: A How-To Starting Guide</b></br>The new capabilities of large language models (LLMs) have prompted a paradigm change in NLP. However, most are developed by resource-rich organizations and kept from the public. In the framework of the BigScience workshop, a collaboration of hundreds of researchers dedicated to democratizing this powerful technology, we created BLOOM, a 176B-parameter open-access multilingual language model. This talk will be a tutorial to share the learnings of this project and make it easier for others to build their own large language models.
+
* [https://muennighoff.github.io/ Niklas Muennighoff, Contextual AI]
* [https://nljubesi.github.io Nikola Ljubešić, Jožef Stefan Institute & University of Ljubljana]</br><b>MaCoCu Corpora: Why Top-Level-Domain Crawling and Web Data Enrichment Matter</b></br>Exploitation of huge crawl dumps seems not to be the most economical approach to obtaining data for smaller languages. While one might argue that the "needle in the haystack" problem of smaller languages in crawl dumps can be circumvented with "gathering all the different needles at the same time", in practice this approach often fails due to various reasons, one of which is the fact that language identification tools that cover many languages do not perform well enough on smaller languages. In our talk we will present the MaCoCu way of collecting web data which, beyond focusing on crawling top-level domains in the quest for high-quality up-to-date data, also encompasses various forms of data enrichment, crucial ingredients for understanding what kind of data we include in our language and translation models.
+
* [https://perso.limsi.fr/neveol/bio.html Aurélie Névéol, Interdisciplinary Laboratory of Numerical Sciences, France]
* [https://commoncrawl.org/about/team/#headshot-14714 Sebastian Nagel, Common Crawl]</br><b>Common Crawl: Data Collection and Use Cases for NLP</b></br>The Common Crawl data sets are sample collections of web pages made accessible free of charge to everyone interested in running machine-scale analysis on web data.  The presentation starts with a short outline of data collection, the crawlers and technologies used from 2008 until today with an emphasis on the challenges to achieve a balanced, both diverse and representative sample of web sites while operating an efficient and  polite crawler.  After an overview of the data formats used to store the primary web page captures, but also text and metadata extracts, indexes, hyperlink graphs, we showcase how Common Crawl data can be processed. We put the focus on three use cases for NLP: bulk processing of plain text and HTML pages, exploration and statistics based on the URL and metadata index, and the "vertical" use of data from specific sites or by content language.
 
* [https://annargrs.github.io Anna Rogers, University of Copenhagen]</br><b>Big Corpus Linguistics: Lessons from the BigScience Workshop</b></br>The continued growth of large language models and their wide-scale adoption in commercial applications make it increasingly important to investigate their training data, both for research and ethical reasons.  However, inspecting such large corpora has been problematic due to difficulties with data access, and the need for large-scale infrastructure. This talk will discuss some lessons learned during the BigScience workshop,  as well as an ongoing effort for investigating the 1.6 Tb multilingual ROOTS corpus.
 
* [https://portizs.eu/#about Pedro Ortiz Suarez, University of Mannheim and DFKI]</br><b>The OSCAR Project: Improving Data Quality in Multilingual Heterogeneous Web-Based Corpora</b></br>In this talk we will introduce the OSCAR project and present our recent efforts in overcoming the difficulties posed by the heterogeneity, noisiness and size of web resources; in order to produce higher quality textual data for as many languages as possible. We will also discuss recent developments in the project, including our data-processing pipelines to annotate and classify large amounts of textual data in constrained infrastructures, as well as our first attempts to become a fully open-source project and manage our growing community. Finally, we will present how the OSCAR initiative is currently collaborating with other projects in order to improve data quality and availability.
 
* Zeerak Talat, Simon Fraser University,</br><b>NLP and Futuring the Past</b></br>Machine learning and NLP are technological projects that implicitly seek to create possible futures and it is therefore important to consider the values that NLP projects into the future as a field. In this session, we will be discussing the values of language technology, how they arise, and their complicated relationship with the potential for equitable futures.
 
* [https://sites.google.com/site/ivanvulic/ Ivan Vulić, Cambridge University]</br><b>Modular and Parameter-Efficient Adaptation of Multilingual NLP Models</b></br>A key challenge in multilingual NLP is developing general language-independent architectures that will be equally applicable to any language. However, this ambition is hindered by the large variation in 1) structural and semantic properties of the world’s languages, as well as 2) raw and task data scarcity for many different languages, tasks, and application domains. As a consequence, existing language technology is still largely limited to a handful of resource-rich languages, leaving the vast majority of the world’s 7,000+ languages and their speakers behind, thus amplifying the problem of the “digital language divide”. In this lecture, we will demonstrate that modularity enables widening the reach of multilingual NLP to minor and low-resource languages and communities, also boosting efficiency and reusability of models' constituent components: modules. We will introduce a range of recent modular and parameter-efficient techniques, additionally pointing to their high-level similarities and differences, that aim to deal with large cross-language variations and low-data learning regimes. We will also demonstrate that low-resource languages, despite very positive research trends and results achieved in recent years, still lag behind major languages in terms of performance, resources, overall representation in NLP research and other key aspects, and will outline several crucial challenges for future research in this area.
 
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
!colspan=3|Monday, February 6, 2023
+
!colspan=3|Sunday, February 4, 2024
 
|-
 
|-
 
| 13:00 || 14:00 || Lunch
 
| 13:00 || 14:00 || Lunch
 
|-
 
|-
| 14:00 || 15:30 || '''Session 1''' Sebastian Nagel
+
| 14:00 || 15:30 || '''Session 1''': [http://svn.nlpl.eu/outreach/skeikampen/2024/alishahi.pdf Analyzing and Interpreting Deep Neural Models of Language] ([http://afra.alishahi.name Afra Alishahi])
 
|-
 
|-
 
| 15:30 || 15:50 || Coffee Break
 
| 15:30 || 15:50 || Coffee Break
 
|-
 
|-
| 15:50 || 17:20 || '''Session 2''' Nikola Ljubešić
+
| 16:00 || 17:30 || '''Session 2''': [http://svn.nlpl.eu/outreach/skeikampen/2024/alishahi.pdf Analyzing and Interpreting Deep Neural Models of Language] ([http://afra.alishahi.name Afra Alishahi])
 
|-
 
|-
| 17:20 || 17:40 || Coffee Break
+
| 17:30 || 17:50 || Coffee Break
 
|-
 
|-
| 17:40 || 19:10 || '''Session 3''' Panel Discussion  "Is the end of academic NLP research in sight?" with Joakim Nivre, Marco Kuhlmann and all the participants
+
| 17:50 || 19:20 || '''Session 3''': [http://svn.nlpl.eu/outreach/skeikampen/2024/muennighoff.pdf Scaling Data-constrained Language Models] ([https://muennighoff.github.io/ Niklas Muennighoff])
 +
 
 +
[https://docs.google.com/presentation/d/1WQDr_2sWkeBzAqNN521QJLyklM40fmv6KY_7JDhAntg/edit Slides]
 
|-
 
|-
 
| 19:30 ||  || Dinner
 
| 19:30 ||  || Dinner
Line 80: Line 78:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
!colspan=3|Tuesday, February 7, 2023
+
!colspan=3|Monday, February 5, 2024
 
|-
 
|-
 
|colspan=3 | Breakfast is available from 07:30
 
|colspan=3 | Breakfast is available from 07:30
 
|-
 
|-
| 08:30 || 10:00 || '''Session 4''' Anna Rogers
+
| 09:00 || 10:30 || '''Session 4''': [http://svn.nlpl.eu/outreach/skeikampen/2024/névéol1.pdf Bias in Natural Language Processing: focus on large language models] ([https://perso.limsi.fr/neveol/bio.html Aurélie Névéol])
 
|-
 
|-
|colspan=3| Lunch is available between 13:00 and 14:30
+
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)
 
|-
 
|-
| 15:00 || 16:20 || '''Session 5''' Teven Le Scao
+
| 15:00 || 16:30 || '''Session 5''': [http://svn.nlpl.eu/outreach/skeikampen/2024/elliot.pdf Multilingual and multimodal language models] ([https://di.ku.dk/english/staff/vip/?pure=en/persons/631668 Desmond Elliot])
 
|-
 
|-
| 16:20 || 16:40 || Coffee Break
+
| 16:30 || 16:50 || Coffee Break
 
|-
 
|-
| 16:40 || 18:00 || '''Session 6''' Mehdi Ali, Pedro Suarez
+
| 16:50 || 17:40 || '''Session 6''': [http://svn.nlpl.eu/outreach/skeikampen/2024/elliot.pdf Multilingual and multimodal language models] ([https://di.ku.dk/english/staff/vip/?pure=en/persons/631668 Desmond Elliot])
 
|-
 
|-
| 18:00 || 18:10 || Coffee Break
+
| 17:40 || 18:00 || Coffee Break
 
|-
 
|-
| 18:10 || 19:30 || '''Session 7''' Emily Bender
+
| 18:00 || 19:15 || '''Session 7'''. «Large vs. Small»: panel discussion. Panelists: Desmond Elliott (University of Copenhagen), Evangelia Gogoulou (RISE, Sweden), Afra Alishahi (Tilburg University), Jan Hajič (Charles University in Prague), and Aurélie Névéol (LISN, France)
 
|-
 
|-
 
| 19:30 ||  || Dinner
 
| 19:30 ||  || Dinner
 
|-
 
|-
| 21:00 || || '''Evening Session''' HPLT, LUMI, LLM & NMT
+
| 21:00 || || '''Evening Session'''. [http://svn.nlpl.eu/outreach/skeikampen/2024/lumi.pdf LUMI: BERT in an Hour, GPT in a Week] ([https://www.mn.uio.no/ifi/english/people/aca/davisamu/ David Samuel] and [https://www.utu.fi/en/people/risto-luukkonen Risto Luukkonen])
 
|}
 
|}
  
Line 106: Line 104:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
!colspan=3|Wednesday, February 8, 2023
+
!colspan=3|Tuesday, February 6, 2024
 
|-
 
|-
 
|colspan=3| Breakfast is available from 07:30
 
|colspan=3| Breakfast is available from 07:30
 
|-
 
|-
| 08:30 || 10:00 || '''Session 8''' Ivan Vulić
+
| 08:30 || 10:00 || '''Session 8''': [http://svn.nlpl.eu/outreach/skeikampen/2024/névéol2.pdf Reproducibility in Natural Language Processing] ([https://perso.limsi.fr/neveol/bio.html Aurélie Névéol])
 
|-
 
|-
 
| 10:00 || 10:30 || Coffee Break
 
| 10:00 || 10:30 || Coffee Break
 
|-
 
|-
| 10:30 || 12:00 || '''Session 9''' Zeerak Talat
+
| 10:30 || 12:00 || '''Session 9''': [http://svn.nlpl.eu/outreach/skeikampen/2024/névéol3.pdf Understanding and measuring the environmental impact of Natural Language Processing] ([https://perso.limsi.fr/neveol/bio.html Aurélie Névéol])
 
|-
 
|-
 
| 12:30 || 13:30 || Lunch
 
| 12:30 || 13:30 || Lunch
 +
|-
 +
| 14:00 || 17:00 || Bus transfer to OSL Airport
 
|}
 
|}
  
 
= Registration =
 
= Registration =
  
Registration is now closed.  The 2023 winter school was heavily over-subscribed.
+
In total, we anticipate around 55 participants at the 2024 winter school.
 
+
We have received more requests for participation than we will be able to accommodate,
In total, we anticipate up to 60 participants in the 2023 Winter School.
+
and the registration form has now been closed.
Please register your intent of participation through our
+
We processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.
[https://nettskjema.no/a/300790 on-line registration form].
+
Interested parties who have submitted the registration form were confirmed in three batches, on December 11, on December 15,
We will process requests for participation on a first-come, first-served
+
and on December 22, which was also the closing date for winter school registration.
basis, with an eye toward regional balance.
 
Interested parties who have submitted the registration form will be confirmed
 
in three batches, one on December 5, another one on December 12, and finally
 
after the closing date for registration, which is Thursday, December 15, 2022.
 
  
Once confirmed by the organizing team, participant names will be published
+
Once confirmed by the organizing team, participant names are published
on this page, and registration will establish a
+
on this page, and registration establishes a
 
''binding agreement'' with the hotel.
 
''binding agreement'' with the hotel.
 
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute
 
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute
Line 143: Line 139:
 
With a few exceptions, winter school participants travel to and from the conference hotel
 
With a few exceptions, winter school participants travel to and from the conference hotel
 
jointly on a chartered bus (the HPLT shuttle).
 
jointly on a chartered bus (the HPLT shuttle).
The bus will leave OSL airport no later than 9:30 CET on Monday, February 6.
+
The bus will leave OSL airport no later than 9:45 CET on Sunday, February 4.
Thus, please meet up at 9:15 and make your arrival known to your assigned
+
Thus, please meet up by 9:30 and make your arrival known to your assigned
 
‘tour guide’ (who will introduce themselves to you by email beforehand).
 
‘tour guide’ (who will introduce themselves to you by email beforehand).
  
The group will gather near the bus and taxi information booth in the downstairs
+
The group will gather near the DNB currency exchange booth in the downstairs
 
arrivals area, just outside the international arrivals luggage claims and slightly
 
arrivals area, just outside the international arrivals luggage claims and slightly
to the right, as one exits the customs area:
+
to the left as one exits the customs area:
The yellow dot numbered (17) on the
+
the yellow dot numbered (18) on the
 
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].
 
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].
The group will then walk over to the bus terminal, to leave the airport by 9:30.
+
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.
 
The drive to the Skeikampen conference hotel will take us about three hours, and the bus
 
The drive to the Skeikampen conference hotel will take us about three hours, and the bus
 
will make one stop along the way to stretch our legs and fill up on coffee.
 
will make one stop along the way to stretch our legs and fill up on coffee.
  
The winter school will end with lunch on Wednesday, February 8, before the group returns
+
The winter school will end with lunch on Tuesday, February 6, before the group returns
 
to OSL airport on the HPLT shuttle.
 
to OSL airport on the HPLT shuttle.
 
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL
 
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL
around 17:00 to 17:30 CET.
+
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.
  
 
= Organization =
 
= Organization =
  
The 2023 Winter School is organized by a team of volunteers from the NLPL and HPLT networks,
+
The 2024 Winter School is organized by a team of volunteers at the University
 +
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,
 
please see below.
 
please see below.
 
For all inquiries regarding registration, the programme, logistics,
 
For all inquiries regarding registration, the programme, logistics,
 
or such, please contact <code>hplt-training@ifi.uio.no</code>.
 
or such, please contact <code>hplt-training@ifi.uio.no</code>.
  
The programme committee is comprised of (regrettably lacking in diversity)
+
The programme committee is comprised of:
  
* Hans Eide (Uninett Sigma2, Norway)
+
* Isabelle Augenstein (University of Copenhagen, Denmark)
* Filip Ginter (University of Turku, Finland)
+
* Emily M. Bemder (University of Washington, USA)
* Barry Haddow (University of Edinburgh, UK)
+
* Kenneth Heafield (Edinburgh University, UK)
* Jan Hajič (Charles University in Prague, Czech Republic)
+
* Jindřich Helcl (Charles University, Czech Republic)
* Daniel Hershcovich (University of Copenhagen, Denmark)
 
 
* Marco Kuhlmann (Linköping University, Sweden)
 
* Marco Kuhlmann (Linköping University, Sweden)
 +
* Per Egil Kummervold (National Library of Norway)
 
* Andrey Kutuzov (University of Oslo, Norway)
 
* Andrey Kutuzov (University of Oslo, Norway)
 
* Joakim Nivre (RISE and Uppsala University, Sweden)
 
* Joakim Nivre (RISE and Uppsala University, Sweden)
Line 181: Line 178:
 
* Sampo Pyysalo (University of Turku, Finland)
 
* Sampo Pyysalo (University of Turku, Finland)
 
* Gema Ramirez (Prompsit Language Engineering, Spain)
 
* Gema Ramirez (Prompsit Language Engineering, Spain)
 +
* Anna Rogers (IT University of Copenhagen, Denmark)
 
* Magnus Sahlgreen (AI Sweden)
 
* Magnus Sahlgreen (AI Sweden)
 
* David Samuel (University of Oslo, Norway)
 
* David Samuel (University of Oslo, Norway)
 
* Jörg Tiedemann (University of Helsinki, Finland)
 
* Jörg Tiedemann (University of Helsinki, Finland)
 +
* Erik Velldal (University of Oslo, Norway)
  
 
= Participants =
 
= Participants =
  
# Mehdi Ali (Fraunhofer IAIS)
+
# Afra Alishahi, Tilburg University (The Netherlands)
# Chantal Amrhein (University of Zurich)
+
# Ali Allaith, University of Copenhagen (Denmark)
# Mark Anderson (Norsk regnesentral)
+
# Nikolay Arefev, University of Oslo (Norway)
# Nikolay Arefev (University of Oslo)
+
# Joseph Attieh, University of Helsinki (Finland)
# Mikko Aulamo (University of Helsinki)
+
# Christopher Brückner, Charles University in Prague (Czech Republic)
# Elisa Bassignana (IT University of Copenhagen)
+
# Lucas Charpentier, University of Oslo (Norway)
# Emily M. Bender (University of Washington)
+
# Konstantin Dobler, Hasso Plattner Institute (Germany)
# Vladimír Benko (Slovak Academy of Sciences)
+
# Aleksei Dorkin, University of Tartu (Estonia)
# Nikolay Bogoychev (Edinburgh University)
+
# Luise Dürlich, Uppsala University (Sweden)
# Dhairya Dalal (University of Galway)
+
# Simen Eide, Schibsted (Norway)
# Annerose Eichel (University of Stuttgart)
+
# Desmond Elliott, University of Copenhagen (Denmark)
# Kenneth Enevoldsen (Aarhus University)
+
# Kenneth Enevoldsen, Aarhus University (Denmark)
# Mehrdad Farahani (Chalmers University of Technology)
+
# Mariia Fedorova, University of Oslo (Norway)
# Ona de Gibert (University of Helsinki)
+
# Emilie Francis, Gothenburg University (Sweden)
# Janis Goldzycher (University of Zurich)
+
# Evangelia Gogoulou, RISE (Sweden)
# Jan Hajič (Charles University in Prague)
+
# Jan Hajič, Charles University in Prague (Czech Republic)
# Jindřich Helcl (Charles University in Prague)
+
# Lasse Hansen, Aarhus University Hospital (Denmark)
# Oskar Holmström (Linköping University)
+
# Jindřich Helcl, Charles University in Prague (Czech Republic)
# Sami Itkonen (University of Helsinki)
+
# Yiping Jin, Pompeu Fabra University (Spain)
# Shaoxiong Ji (University of Helsinki)
+
# Lars Johnsen, National Library (Norway)
# Antonia Karamolegkou (University of Copenhagen)
+
# Amanda Kann, Stockholm University (Sweden)
# Nina Khairova (Umeå universitet)
+
# Jan Kostkan, Aarhus University (Denmark)
# Marco Kuhlmann (Linköping University)
+
# Andrey Kutuzov, University of Oslo (Norway)
# Per Egil Kummervold (National Library of Norway)
+
# Tsz Kin Lam, University of Edinburgh (UK)
# Andrey Kutuzov (University of Oslo)
+
# Wenyan Li, University of Copenhagen (Denmark)
# Jelmer van der Linde (Edinburgh University)
+
# Pierre Lison, Norsk Regnesentral
# Pierre Lison (Norsk regnesentral)
+
# Jouni Luoma, University of Turku (Finland)
# Nikola Ljubešić (Jožef Stefan Institute & University of Ljubljana)
+
# Risto Luukkonen, University of Turku (Finland)
# Yan Meng (University of Amsterdam)
+
# Arianna Masciolini, Gothenburg University (Sweden)
# Max Müller-Eberstein (IT University of Copenhagen)
+
# Petter Mæhlum, University of Oslo (Norway)
# Sebastian Nagel (Common Crawl)
+
# Vladislav Mikhailov, University of Oslo (Norway)
# Graeme Nail (Edinburgh University)
+
# Yousuf Ali Mohammed, Gothenburg University (Sweden)
# Anna Nikiforovskaja (Université de Lorraine)
+
# Aurélie Névéol, LISN & CNRS (France)
# Irina Nikishina (Universität Hamburg)
+
# Tobias Norlund, AI Sweden (Sweden)
# Joakim Nivre (RISE and Uppsala University)
+
# Stephan Oepen, University of Oslo (Norway)
# Stephan Oepen (University of Oslo)
+
# Lilja Øvrelid, University of Oslo (Norway)
# Anders Jess Pedersen (Alexandra Institute)
+
# Alberto Parola, University of Copenhagen (Denmark)
# Laura Cabello Piqueras (University of Copenhagen)
+
# Siddhesh Pawar, University of Copenhagen (Denmark)
# Myrthe Reuver (Vrije Universiteit Amsterdam)
+
# Erofili Psaltaki, University of Helsinki (Finland)
# Anna Rogers (University of Copenhagen)
+
# Akseli Reunamo, University of Turku (Finland)
# Frankie Robertson (University of Jyväskylä)
+
# David Samuel, University of Oslo (Norway)
# Javier De La Rosa (National Library of Norway)
+
# Ricardo Muñoz Sánchez, Gothenburg University (Sweden)
# Phillip Rust (University of Copenhagen)
+
# Gautam Kishore Shahi, University of Duisburg-Essen (Germany)
# Egil Rønnestad (University of Oslo)
+
# Janine Siewert, University of Helsinki (Finland)
# David Samuel (University of Oslo)
+
# Étienne Simon, University of Oslo (Norway)
# Diana Santos (University of Oslo)
+
# Inguna Skadiņa, University of Latvia
# Teven Le Scao (Hugging Face)
+
# Ondrej Sotolar, Masaryk University (Czech Republic)
# Yves Scherrer (University of Helsinki)
+
# Pavel Stranak, Charles University in Prague (Czech Republic)
# Edoardo Signoroni (Masaryk University)
+
# Maria Irena Szawerna, Gothenburg University (Sweden)
# Michal Štefánik (Masaryk University)
+
# Jörg Tiedemann, University of Helsinki (Finland)
# Pedro Ortiz Suarez (University of Mannheim and DFKI)
+
# Ekaterina Uetova, Technological University Dublin (Ireland)
# Zeerak Talat (Simon Fraser University)
+
# Erik Velldal, University of Oslo (Norway)
# Jörg Tiedemann (University of Helsinki)
+
# Tea Vojtěchová, Charles University in Prague (Czech Republic)
# Samia Touileb (University of Bergen)
+
# Jonas Waldendorf, University of Edinburgh (UK)
# Teemu Vahtola (University of Helsinki)
+
# Jaume Zaragoza-Bernabeu, Prompsit Language Engineering (Spain)
# Thomas Vakili (Stockholm University)
+
# Giulio Zhou, University of Edinburgh (UK)
# Dušan Variš (Charles University in Prague)
 
# Tea Vojtěchová (Charles University in Prague)
 
# Ivan Vulić (University of Cambridge)
 
# Nicholas Walker (Norsk regnesentral)
 
# Sondre Wold (University of Oslo)
 
# Jaume Zaragoza-Bernabeu (Prompsit)
 

Latest revision as of 17:46, 7 February 2024

HPLT & NLPL Winter School on Large Language Models: Creation, Customization, Evaluation, and Use

Skeikampen.2023.jpg

Background

Since 2023, the NLPL network and Horizon Europe project High-Performance Language Technologies (HPLT) have joined forces to organize the successful winter school series on Web-scale NLP. The winter school seeks to stimulate community formation, i.e. strengthening interaction and collaboration among European research teams in NLP and advancing a shared level of knowledge and experience in using high-performance e-infrastructures for large-scale NLP research. The 2024 edition of the winter school puts special emphasis on NLP researchers from countries who participate in the EuroHPC LUMI consortium. For additional background, please see the archival pages from the 2018, 2019, 2020, and 2023 NLPL Winter Schools.

For early 2024, HPLT will hold its winter school from Sunday, February 4, to Tuesday, February 6, 2024, at a mountain-side hotel (with skiing and walking opportunities) about two hours north of Oslo. The project will organize group bus transfer from and to the Oslo airport Gardermoen, leaving the airport at 9:45 on Sunday morning and returning there around 17:30 on Tuesday afternoon.

The winter school is subsidized by the HPLT project: there is no fee for participants and no charge for the bus transfer to and from the conference hotel. All participants will have to cover their own travel and accomodation at Skeikampen, however. Two nights at the hotel, including all meals, will come to NOK 3745 (NOK 3345 per person in a shared double room), to be paid to the hotel directly.

Programme

The 2024 winter school will have a thematic focus on Large Language Models: Creation, Customization, Evaluation, and Use. The programme will be comprised of in-depth technical presentations (possibly including some hands-on elements) by seasoned experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP. The programme will be complemented with a panel discussion and a ‘walk-through’ of available infrastructure on the shared EuroHPC LUMI supercomputer.

Confirmed presenters include:

Sunday, February 4, 2024
13:00 14:00 Lunch
14:00 15:30 Session 1: Analyzing and Interpreting Deep Neural Models of Language (Afra Alishahi)
15:30 15:50 Coffee Break
16:00 17:30 Session 2: Analyzing and Interpreting Deep Neural Models of Language (Afra Alishahi)
17:30 17:50 Coffee Break
17:50 19:20 Session 3: Scaling Data-constrained Language Models (Niklas Muennighoff)

Slides

19:30 Dinner
Monday, February 5, 2024
Breakfast is available from 07:30
09:00 10:30 Session 4: Bias in Natural Language Processing: focus on large language models (Aurélie Névéol)
Free time (Lunch is available between 13:00 and 14:30)
15:00 16:30 Session 5: Multilingual and multimodal language models (Desmond Elliot)
16:30 16:50 Coffee Break
16:50 17:40 Session 6: Multilingual and multimodal language models (Desmond Elliot)
17:40 18:00 Coffee Break
18:00 19:15 Session 7. «Large vs. Small»: panel discussion. Panelists: Desmond Elliott (University of Copenhagen), Evangelia Gogoulou (RISE, Sweden), Afra Alishahi (Tilburg University), Jan Hajič (Charles University in Prague), and Aurélie Névéol (LISN, France)
19:30 Dinner
21:00 Evening Session. LUMI: BERT in an Hour, GPT in a Week (David Samuel and Risto Luukkonen)


Tuesday, February 6, 2024
Breakfast is available from 07:30
08:30 10:00 Session 8: Reproducibility in Natural Language Processing (Aurélie Névéol)
10:00 10:30 Coffee Break
10:30 12:00 Session 9: Understanding and measuring the environmental impact of Natural Language Processing (Aurélie Névéol)
12:30 13:30 Lunch
14:00 17:00 Bus transfer to OSL Airport

Registration

In total, we anticipate around 55 participants at the 2024 winter school. We have received more requests for participation than we will be able to accommodate, and the registration form has now been closed. We processed requests for participation on a first-come, first-served basis, with an eye toward regional balance. Interested parties who have submitted the registration form were confirmed in three batches, on December 11, on December 15, and on December 22, which was also the closing date for winter school registration.

Once confirmed by the organizing team, participant names are published on this page, and registration establishes a binding agreement with the hotel. Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute spaces), and no-shows will be charged the full price for at least one night by the hotel.

Logistics

With a few exceptions, winter school participants travel to and from the conference hotel jointly on a chartered bus (the HPLT shuttle). The bus will leave OSL airport no later than 9:45 CET on Sunday, February 4. Thus, please meet up by 9:30 and make your arrival known to your assigned ‘tour guide’ (who will introduce themselves to you by email beforehand).

The group will gather near the DNB currency exchange booth in the downstairs arrivals area, just outside the international arrivals luggage claims and slightly to the left as one exits the customs area: the yellow dot numbered (18) on the OSL arrivals map. The group will then walk over to the bus terminal, to leave the airport not long after 9:40. The drive to the Skeikampen conference hotel will take us about three hours, and the bus will make one stop along the way to stretch our legs and fill up on coffee.

The winter school will end with lunch on Tuesday, February 6, before the group returns to OSL airport on the HPLT shuttle. The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.

Organization

The 2024 Winter School is organized by a team of volunteers at the University of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond, please see below. For all inquiries regarding registration, the programme, logistics, or such, please contact hplt-training@ifi.uio.no.

The programme committee is comprised of:

  • Isabelle Augenstein (University of Copenhagen, Denmark)
  • Emily M. Bemder (University of Washington, USA)
  • Kenneth Heafield (Edinburgh University, UK)
  • Jindřich Helcl (Charles University, Czech Republic)
  • Marco Kuhlmann (Linköping University, Sweden)
  • Per Egil Kummervold (National Library of Norway)
  • Andrey Kutuzov (University of Oslo, Norway)
  • Joakim Nivre (RISE and Uppsala University, Sweden)
  • Stephan Oepen (University of Oslo, Norway)
  • Sampo Pyysalo (University of Turku, Finland)
  • Gema Ramirez (Prompsit Language Engineering, Spain)
  • Anna Rogers (IT University of Copenhagen, Denmark)
  • Magnus Sahlgreen (AI Sweden)
  • David Samuel (University of Oslo, Norway)
  • Jörg Tiedemann (University of Helsinki, Finland)
  • Erik Velldal (University of Oslo, Norway)

Participants

  1. Afra Alishahi, Tilburg University (The Netherlands)
  2. Ali Allaith, University of Copenhagen (Denmark)
  3. Nikolay Arefev, University of Oslo (Norway)
  4. Joseph Attieh, University of Helsinki (Finland)
  5. Christopher Brückner, Charles University in Prague (Czech Republic)
  6. Lucas Charpentier, University of Oslo (Norway)
  7. Konstantin Dobler, Hasso Plattner Institute (Germany)
  8. Aleksei Dorkin, University of Tartu (Estonia)
  9. Luise Dürlich, Uppsala University (Sweden)
  10. Simen Eide, Schibsted (Norway)
  11. Desmond Elliott, University of Copenhagen (Denmark)
  12. Kenneth Enevoldsen, Aarhus University (Denmark)
  13. Mariia Fedorova, University of Oslo (Norway)
  14. Emilie Francis, Gothenburg University (Sweden)
  15. Evangelia Gogoulou, RISE (Sweden)
  16. Jan Hajič, Charles University in Prague (Czech Republic)
  17. Lasse Hansen, Aarhus University Hospital (Denmark)
  18. Jindřich Helcl, Charles University in Prague (Czech Republic)
  19. Yiping Jin, Pompeu Fabra University (Spain)
  20. Lars Johnsen, National Library (Norway)
  21. Amanda Kann, Stockholm University (Sweden)
  22. Jan Kostkan, Aarhus University (Denmark)
  23. Andrey Kutuzov, University of Oslo (Norway)
  24. Tsz Kin Lam, University of Edinburgh (UK)
  25. Wenyan Li, University of Copenhagen (Denmark)
  26. Pierre Lison, Norsk Regnesentral
  27. Jouni Luoma, University of Turku (Finland)
  28. Risto Luukkonen, University of Turku (Finland)
  29. Arianna Masciolini, Gothenburg University (Sweden)
  30. Petter Mæhlum, University of Oslo (Norway)
  31. Vladislav Mikhailov, University of Oslo (Norway)
  32. Yousuf Ali Mohammed, Gothenburg University (Sweden)
  33. Aurélie Névéol, LISN & CNRS (France)
  34. Tobias Norlund, AI Sweden (Sweden)
  35. Stephan Oepen, University of Oslo (Norway)
  36. Lilja Øvrelid, University of Oslo (Norway)
  37. Alberto Parola, University of Copenhagen (Denmark)
  38. Siddhesh Pawar, University of Copenhagen (Denmark)
  39. Erofili Psaltaki, University of Helsinki (Finland)
  40. Akseli Reunamo, University of Turku (Finland)
  41. David Samuel, University of Oslo (Norway)
  42. Ricardo Muñoz Sánchez, Gothenburg University (Sweden)
  43. Gautam Kishore Shahi, University of Duisburg-Essen (Germany)
  44. Janine Siewert, University of Helsinki (Finland)
  45. Étienne Simon, University of Oslo (Norway)
  46. Inguna Skadiņa, University of Latvia
  47. Ondrej Sotolar, Masaryk University (Czech Republic)
  48. Pavel Stranak, Charles University in Prague (Czech Republic)
  49. Maria Irena Szawerna, Gothenburg University (Sweden)
  50. Jörg Tiedemann, University of Helsinki (Finland)
  51. Ekaterina Uetova, Technological University Dublin (Ireland)
  52. Erik Velldal, University of Oslo (Norway)
  53. Tea Vojtěchová, Charles University in Prague (Czech Republic)
  54. Jonas Waldendorf, University of Edinburgh (UK)
  55. Jaume Zaragoza-Bernabeu, Prompsit Language Engineering (Spain)
  56. Giulio Zhou, University of Edinburgh (UK)