Difference between revisions of "Community/training"
(→Participants) |
(→Schedule) |
||
| Line 71: | Line 71: | ||
| 15:30 || 15:50 || Coffee Break | | 15:30 || 15:50 || Coffee Break | ||
|- | |- | ||
| − | | 16:00 || 17:30 || '''Session 2''' François Yvon <p class="mw-collapsible mw-collapsed">''' | + | | 16:00 || 17:30 || '''Session 2''' François Yvon <p class="mw-collapsible mw-collapsed">'''Evaluating Multilingual Models'''<br> Large Language Models introduced in the recent years have been found extremely helpful to advance the state-of-the-art in many Natural Language Applications, notably due to their ability to compute numerical, high-dimensional, representations of linguistic units such as words or sentences. Multilingual language models go one step further and add the ability to handle multiple languages, sometimes even multiple scripts, with just one single model. In this presentation, I will discuss multilingual language models at length, with a focus on the evaluation of their multilingual abilities, which raises two difficult questions: (a) to evaluate their performance as if they were just a collection of monolingual models; (b) to evaluate their performance as integrated multilingual models, capable of bridging between languages. </p> |
|- | |- | ||
| 17:30 || 17:50 || Coffee Break | | 17:30 || 17:50 || Coffee Break | ||
| Line 86: | Line 86: | ||
|colspan=3 | Breakfast is available from 07:30 | |colspan=3 | Breakfast is available from 07:30 | ||
|- | |- | ||
| − | | 09:00 || 10:30 || '''Session 4''' | + | | 09:00 || 10:30 || '''Session 4''' François Yvon <p class="mw-collapsible mw-collapsed">'''Text Generation: Know your Options!'''<br>Text generation, contextual or non-contextual, is ubiquitous in the current LLM era, as it serves as the most basic block in multiple application contexts, from question answering and dialog systems to text summarization and machine translation, and many more. Generation is thus equally useful to compute deterministic and highly non-deterministic mappings with various level of output constraints. Furthermore, text generation is also used as a sub-routine of more complex generation strategies, aiming to produce syntactically well-formed (e.g. for code generation) or semantically consistent outputs, possibility through multiple steps of generation (e.g, in chain-of-thoughts generation) or to collect diverse samples from the generating distribution. To cover this considerable diversity of uses, multiple text generation strategies have been proposed, some less well-known than others. In this talk I will review various families of generation algorithms, from the most basic ones to the more sophisticated approaches, so as to document, as much as possible, the possible options that are available to text generation users. The final part will survey some decoding issues that are specific to multilingual models. </p> |
|- | |- | ||
|colspan=3| Free time (Lunch is available between 13:00 and 14:30) | |colspan=3| Free time (Lunch is available between 13:00 and 14:30) | ||
|- | |- | ||
| − | | 15:30 || 17:00 || '''Session 5''' | + | | 15:30 || 17:00 || '''Session 5''' Max Idahl <p class="mw-collapsible mw-collapsed">'''Multilingual model-based quality filtering'''<br></p> |
|- | |- | ||
| 17:00 || 17:20 || Coffee Break | | 17:00 || 17:20 || Coffee Break | ||
Revision as of 11:43, 30 January 2026
Contents
Circle U, NLPL, & OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation
Background
In 2026, the NLPL network and Digital Europe project OpenEuroLLM have joined forces to organize the successful winter school series on Web-scale NLP. The winter school seeks to stimulate community formation, i.e. strengthening interaction and collaboration among European research teams in NLP and advancing a shared level of knowledge and experience in using high-performance e-infrastructures for large-scale NLP research. This 2026 edition of the winter school puts special emphasis on NLP researchers from countries who participate in the EuroHPC consortium and is endorsed as a doctoral training event in the European Circle U university alliance. For additional background, please see the archival pages from the 2018, 2019, 2020, 2023, 2024, and 2025 NLPL Winter Schools.
For early 2026, NLPL will hold its winter school from Monday, February 2, to Wednesday, February 4, 2026, at a mountain-side hotel (with skiing and walking opportunities) about two hours north of Oslo. The project will organize group bus transfer from and to the main Oslo airport Gardermoen (OSL), leaving the airport at 9:45 on Monday morning and returning there around 17:30 on Wednesday afternoon.
The winter school is subsidized by the OpenEuroLLM project: there is no fee for participants and no charge for the bus transfer to and from the conference hotel. All participants will have to cover their own travel and accommodation at Skeikampen, however. Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), to be paid to the hotel directly upon arrival.
Programme
The 2026 winter school has a thematic focus on Multilinguality in LLM Development and Evaluation. The programme is comprised of in-depth technical presentations (possibly including some hands-on elements) by international experts, with special emphasis on open science and European languages, but also includes critical reflections on current development trends in LLM-focused NLP. The programme will be complemented with a ‘walk-through’ of example EuroHPC experience reports from the OpenEuroLLM consortium and with reflections about current LLM-oriented activities of the National Library of Norway.
Confirmed presenters and talks include:
- Barbara Plank, Ludwig Maximilian University of Munich
- Laurie Burchell and Pedro Ortiz Suarez, Common Crawl
- Max Idahl, ellamind
- Julia Kreutzer, Cohere for Labs
- David Salinas, ELLIS Institute Tübingen
- François Yvon, Sorbonne Université
Schedule
| Monday, February 2, 2026 | ||
|---|---|---|
| 13:00 | 14:00 | Lunch |
| 14:00 | 15:30 | Session 1 Laurie Burchell and Pedro Ortiz Suarez Multilinguality at Common Crawl: improving language coverage for the largest open web corpus |
| 15:30 | 15:50 | Coffee Break |
| 16:00 | 17:30 | Session 2 François Yvon Evaluating Multilingual Models |
| 17:30 | 17:50 | Coffee Break |
| 17:50 | 19:20 | Session 3 Julia Kreutzer Evaluating Generations Multilingually: Current challenges and Lessons from Machine Translation |
| 19:30 | Dinner | |
| Tuesday, February 3, 2026 | ||
|---|---|---|
| Breakfast is available from 07:30 | ||
| 09:00 | 10:30 | Session 4 François Yvon Text Generation: Know your Options! |
| Free time (Lunch is available between 13:00 and 14:30) | ||
| 15:30 | 17:00 | Session 5 Max Idahl Multilingual model-based quality filtering |
| 17:00 | 17:20 | Coffee Break |
| 17:20 | 19:20 | Session 6 David Salinas Challenges in Evaluating Generative Models |
| 19:30 | Dinner | |
| 21:00 | Evening Session: National Library of Norway, OpenEuroLLM, MultiSynt | |
| Wednesday, February 4, 2026 | ||
|---|---|---|
| Breakfast is available from 07:30 | ||
| 08:30 | 10:00 | Session 8 Julia Kreutzer Optimizing data for multilingual post-training |
| 10:00 | 10:30 | Coffee Break |
| 10:30 | 12:00 | Session 9 Barbara Plank NLP Beyond the Standard: Dialects, Variation, and Shared Representations in Multilingual Language Models |
| 12:30 | 13:30 | Lunch |
| 13:45 | 16:45 | Bus transfer to OSL Airport |
Registration
In total, we expect 60–70 participants at the 2026 winter school. Registration for interested participants is now closed. Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance. Interested parties who have submitted the registration form were confirmed in three batches, on November 28, on December 5, and on December 19, which was also the closing date for winter school registration.
Once confirmed by the organizing team, participant names are published on this page, and registration establishes a binding agreement with the hotel. Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute spaces), and no-shows will be charged the full price for at least one night by the hotel.
Logistics
With a few exceptions, winter school participants travel to and from the conference hotel jointly on a chartered bus (the OpenEuroLLM shuttle). The bus will leave OSL airport no later than 9:45 CET on Monday, February 2. Thus, please meet up by 9:30 and make your arrival known to your assigned ‘tour guide’ (who will introduce themselves to you by email beforehand).
The group will gather near the DNB currency exchange booth in the downstairs arrivals area, just outside the international arrivals luggage claims and slightly to the left as one exits the customs area: the yellow dot numbered (18) on the OSL arrivals map. The group will then walk over to the bus terminal, to leave the airport not long after 9:40. The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus will make one stop along the way to stretch our legs and fill up on coffee.
The winter school will end with lunch on Wednesday, February 4, before the group returns to OSL airport on the OpenEuroLLM shuttle. The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.
Organization
The 2026 Winter School is organized by a team of volunteers at the University
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and
NLPL networks and beyond, please see below.
For all inquiries regarding registration, the programme, logistics,
or such, please contact nlpl-training@ifi.uio.no.
The programme committee is comprised of (in alphabetical order):
- Jenia Jitsev (Forschungszentrum Jülich, Germany)
- Andrey Kutuzov (University of Oslo, Norway)
- Alessandro Lenci (University of Pisa, Italy)
- Stephan Oepen (University of Oslo, Norway)
- Sampo Pyysalo (University of Turku, Finland)
- David Salinas (ELLIS Institute, Germany)
- Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)
- Jörg Tiedemann (University of Helsinki, Finland)
- Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)
- Guillaume Wisniewski (Paris Cité University, France)
Participants
- Adam Hrin, AMD Silo AI (Finland)
- Aitor Soroa, University of the Basque Country (Spain)
- Agnes Toftgård, The National Library (Sweden)
- Alicia Núñez Alcover, Prompsit (Spain)
- Anastasia Philipps, University of Oslo (Norway)
- Andrey Kutuzov, University of Oslo (Norway)
- Angelina Zanardi, National Library of Norway
- Anni Moisala, CSC – IT Center for Science (Finland)
- Artūrs Znotiņš, University of Latvia (Latvia)
- Barbara Heinisch, Eurac Research (Italy)
- Barbara Plank, Ludwig-Maximilians-Universität München (Germany)
- Charlotte Noel, LINAGORA Labs (France)
- Dalton Harmsen, Eindhoven University of Technology (Netherlands)
- David Salinas, ELLIS institute Tübingen (Germany)
- Diana Kylymnyk, University of Exeter (UK)
- Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)
- Etienne Simon, University of Oslo (Norway)
- Faton Rekathati, The National Library (Sweden)
- Fedor Vitiugin, University of Turku (Finland)
- François Yvon, CNRS (France)
- Fred Philippy, University of Luxembourg (Luxembourg)
- Ghulam Muhammed Khan, University of Exeter (United Kingdom)
- Gianluca Barmina, University of Southern Denmark (Denmark)
- Hannah Clausen, University of Oslo (Norway)
- Hannan Mahadik, ELLIS Institute Tübingen (Germany)
- Iglika Nikolova-Stoupak, Sorbonne Université (France)
- Jan Hajič, Charles University (Czech Republic)
- Jiajing Wan, University of Bergen (Norway)
- Jindřich Helcl, University of Oslo (Norway)
- Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)
- Jouni Luoma, AMD Silo AI (Finland)
- Julia Kreutzer, Cohere Labs (Canada)
- Justyna Sikora, The National Library (Sweden)
- Katarina Strani Herriot-Watt University (United Kingdom)
- Kevin Glocker, Linköping University (Sweden)
- Kristýna Onderková, Charles University (Czech Republic)
- Laurène Cave, Sorbonne Université (France)
- Lisa Yankovskaya, University of Tartu (Estonia)
- Maja Buljan, University of Oslo (Norway)
- Markus Heiervang, National Library of Norway
- Marthe Midtgaard, National Library of Norway
- Mattes Ruckdeschel, IT University of Copenhagen (Denmark)
- Maximilian Idahl, ellamind (Germany)
- Meihan Tong, University of Oslo (Norway)
- Muhammad Imran, University of A Coruña (Spain)
- Nam Luu, Charles University (Czech Republic)
- Neda Jamshidi, University of Sienna (Italy)
- Nikolay Arefev, University of Oslo (Norway)
- Nils Grünefeld, IT University of Copenhagen (Denmark)
- Pedro Ortiz Suarez, Common Crawl Foundation (USA)
- Rolv-Arild Braaten, National Library of Norway
- Romina Oji, Linköping University (Sweden)
- Sampo Pyysalo, University of Turku (Finland)
- Shanshan Xu, University of Copenhagen (Denmark)
- Shenbin Qian, University of Oslo (Norway)
- Stephan Oepen, University of Oslo (Norway)
- Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)
- Tita Enstad, National Library of Norway
- Tommaso Green, University of Mannheim (Germany)
- Tudor Nicolae Mateiu, Prompsit (Spain)
- Vladislav Mikhailov, University of Oslo (Norway)
- Wafa Aissa, UCLouvain (Belgium)
- Xiaorui Yu, King's College London (UK)
- Yihang Lu, Sorbonne Université (France)
- Yiheng Wu, University of Helsinki (Finland)
- Yves Scherrer, University of Oslo (Norway)
- Zihao Li, University of Helsinki (Finland)