<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.nlpl.eu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Andreku</id>
	<title>Nordic Language Processing Laboratory - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.nlpl.eu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Andreku"/>
	<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/Special:Contributions/Andreku"/>
	<updated>2026-05-21T02:35:43Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.31.10</generator>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1858</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1858"/>
		<updated>2026-02-01T15:27:15Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school has a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme is comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also includes critical reflections on current development trends in LLM-focused NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium and with reflections about current LLM-oriented activities of the National Library of Norway. &lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS Institute Tübingen&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Laurie Burchell and Pedro Ortiz Suarez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilinguality at Common Crawl: improving language coverage for the largest open web corpus'''&amp;lt;br&amp;gt;The Common Crawl Foundation (CCF) provides the largest open corpus of web data, enabling a wide range of scientific and technical applications including large language model (LLM) development. However, our current data processing pipeline faces challenges when processing multilingual data, decreasing language representation and impacting downstream model performance. In this talk, we will discuss CCF’s initiatives to improve multilingual coverage and language identification of our web corpus. These efforts include soliciting crowd-sourced web seeds for under-served languages, running the First Workshop for Multilingual Data Quality Signals at COLM 2025, and creating CommonLID, a community-driven, human-annotated language identification benchmark for the web domain. Throughout, we emphasise the collaborative nature of our efforts, working in partnership with members of the NLP community to improve content available in their languages.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Multilingual Models'''&amp;lt;br&amp;gt; Large Language Models introduced in the recent years have been found extremely helpful to advance the state-of-the-art in many Natural Language Applications, notably due to their ability to compute numerical, high-dimensional, representations of linguistic units such as words or sentences. Multilingual language models go one step further and add the ability to handle multiple languages, sometimes even multiple scripts, with just one single model. In this presentation, I will discuss multilingual language models at length, with a focus on the evaluation of their multilingual abilities, which raises two difficult questions: (a) to evaluate their performance as if they were just a collection of monolingual models; (b) to evaluate their performance as integrated multilingual models, capable of bridging between languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Generations Multilingually: Current challenges and Lessons from Machine Translation'''&amp;lt;br&amp;gt;In this session we will dive into the particular challenge of evaluating LLMs across many languages in generative tasks. We will take a look at the &amp;quot;sister field&amp;quot; of machine translation and inspect what principles have led to advances in understanding quality across languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Text Generation: Know your Options!'''&amp;lt;br&amp;gt;Text generation, contextual or non-contextual, is ubiquitous in the current LLM era, as it serves as the most basic block in multiple application contexts, from question answering and dialog systems to text summarization and machine translation, and many more. Generation is thus equally useful to compute deterministic and highly non-deterministic mappings with various level of output constraints. Furthermore, text generation is also used as a sub-routine of more complex generation strategies, aiming to produce syntactically well-formed (e.g. for code generation) or semantically consistent outputs, possibility through multiple steps of generation (e.g, in chain-of-thoughts generation) or to collect diverse samples from the generating distribution. To cover this considerable diversity of uses, multiple text generation strategies have been proposed, some less well-known than others. In this talk I will review various families of generation algorithms, from the most basic ones to the more sophisticated approaches, so as to document, as much as possible, the possible options that are available to text generation users. The final part will survey some decoding issues that are specific to multilingual models. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Max Idahl &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilingual Model-Based Quality Filtering for LLM Pretraining'''&amp;lt;br&amp;gt;Data quality is the highest-leverage factor for LLM performance, with recent work showing significant training efficiency gains through careful curation. This presentation traces the evolution from rule-based filtering to modern model-based approaches that now work across dozens of languages. We cover the progression from basic perplexity-based filters, to FastText and encoder-based scorers, to our newly released Propella models that annotate documents across 18 properties for 57 languages at scale. The talk includes practical insights into building multilingual filtering pipelines.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' David Salinas &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Challenges in Evaluating Generative Models'''&amp;lt;br&amp;gt;In this talk, we will discuss the evaluation of generative models, in particular Large Language Models (LLMs). Given that such models produce open-ended output, their evaluation requires different techniques than static evaluations such as simple question-answering benchmarks. We will first discuss human annotations and their use in leaderboards such as LMArena and ComparIA. We will then focus on automatic evaluation relying on LLM judges. In particular, we will describe current challenges with LLM judges before discussing their application in multilingual settings.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session''': National Library of Norway, OpenEuroLLM, MultiSynt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Barbara Plank &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''NLP Beyond the Standard: Dialects, Variation, and Shared Representations in Multilingual Language Models'''&amp;lt;br&amp;gt;Multilingual language models have primarily focused on cross-lingual differences, with intra-language variation only recently gaining more attention. Dialects and non-standard varieties challenge core assumptions about data, representation, and evaluation. In this talk, I discuss what makes dialects particularly challenging for multilingual models, review approaches starting from early encoder-based methods, and give an overview of resources developed for dialectal NLP, with a focus on German dialects. I then turn to recent work on multilingual training dynamics and shared representations, analyzing when linguistic information and shared concept spaces emerge during training and where alignment breaks down. Although dialects are not yet explicitly modeled in this analysis, the findings provide insight into multilingual representation learning during pre-training. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Optimizing data for multilingual post-training'''&amp;lt;br&amp;gt;In this session we will look into techniques for augmenting data collections for better multilingual coverage. We will discuss the role of translation and inference settings, and explore methods for optimizing multilingual data both on the prompt and the generation side.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Agnes Toftgård, The National Library (Sweden)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Anastasia Philipps, University of Oslo (Norway)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Angelina Zanardi, National Library of Norway&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology (Netherlands)&lt;br /&gt;
# David Salinas, ELLIS institute Tübingen (Germany)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library of Norway&lt;br /&gt;
# Marthe Midtgaard, National Library of Norway&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Rolv-Arild Braaten, National Library of Norway&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library of Norway&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1857</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1857"/>
		<updated>2026-01-30T22:10:13Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school has a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme is comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also includes critical reflections on current development trends in LLM-focused NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium and with reflections about current LLM-oriented activities of the National Library of Norway. &lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS Institute Tübingen&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Laurie Burchell and Pedro Ortiz Suarez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilinguality at Common Crawl: improving language coverage for the largest open web corpus'''&amp;lt;br&amp;gt;The Common Crawl Foundation (CCF) provides the largest open corpus of web data, enabling a wide range of scientific and technical applications including large language model (LLM) development. However, our current data processing pipeline faces challenges when processing multilingual data, decreasing language representation and impacting downstream model performance. In this talk, we will discuss CCF’s initiatives to improve multilingual coverage and language identification of our web corpus. These efforts include soliciting crowd-sourced web seeds for under-served languages, running the First Workshop for Multilingual Data Quality Signals at COLM 2025, and creating CommonLID, a community-driven, human-annotated language identification benchmark for the web domain. Throughout, we emphasise the collaborative nature of our efforts, working in partnership with members of the NLP community to improve content available in their languages.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Multilingual Models'''&amp;lt;br&amp;gt; Large Language Models introduced in the recent years have been found extremely helpful to advance the state-of-the-art in many Natural Language Applications, notably due to their ability to compute numerical, high-dimensional, representations of linguistic units such as words or sentences. Multilingual language models go one step further and add the ability to handle multiple languages, sometimes even multiple scripts, with just one single model. In this presentation, I will discuss multilingual language models at length, with a focus on the evaluation of their multilingual abilities, which raises two difficult questions: (a) to evaluate their performance as if they were just a collection of monolingual models; (b) to evaluate their performance as integrated multilingual models, capable of bridging between languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Generations Multilingually: Current challenges and Lessons from Machine Translation'''&amp;lt;br&amp;gt;In this session we will dive into the particular challenge of evaluating LLMs across many languages in generative tasks. We will take a look at the &amp;quot;sister field&amp;quot; of machine translation and inspect what principles have led to advances in understanding quality across languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Text Generation: Know your Options!'''&amp;lt;br&amp;gt;Text generation, contextual or non-contextual, is ubiquitous in the current LLM era, as it serves as the most basic block in multiple application contexts, from question answering and dialog systems to text summarization and machine translation, and many more. Generation is thus equally useful to compute deterministic and highly non-deterministic mappings with various level of output constraints. Furthermore, text generation is also used as a sub-routine of more complex generation strategies, aiming to produce syntactically well-formed (e.g. for code generation) or semantically consistent outputs, possibility through multiple steps of generation (e.g, in chain-of-thoughts generation) or to collect diverse samples from the generating distribution. To cover this considerable diversity of uses, multiple text generation strategies have been proposed, some less well-known than others. In this talk I will review various families of generation algorithms, from the most basic ones to the more sophisticated approaches, so as to document, as much as possible, the possible options that are available to text generation users. The final part will survey some decoding issues that are specific to multilingual models. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Max Idahl &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilingual model-based quality filtering'''&amp;lt;br&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' David Salinas &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Challenges in Evaluating Generative Models'''&amp;lt;br&amp;gt;In this talk, we will discuss the evaluation of generative models, in particular Large Language Models (LLMs). Given that such models produce open-ended output, their evaluation requires different techniques than static evaluations such as simple question-answering benchmarks. We will first discuss human annotations and their use in leaderboards such as LMArena and ComparIA. We will then focus on automatic evaluation relying on LLM judges. In particular, we will describe current challenges with LLM judges before discussing their application in multilingual settings.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session''': National Library of Norway, OpenEuroLLM, MultiSynt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Barbara Plank &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''NLP Beyond the Standard: Dialects, Variation, and Shared Representations in Multilingual Language Models'''&amp;lt;br&amp;gt;Multilingual language models have primarily focused on cross-lingual differences, with intra-language variation only recently gaining more attention. Dialects and non-standard varieties challenge core assumptions about data, representation, and evaluation. In this talk, I discuss what makes dialects particularly challenging for multilingual models, review approaches starting from early encoder-based methods, and give an overview of resources developed for dialectal NLP, with a focus on German dialects. I then turn to recent work on multilingual training dynamics and shared representations, analyzing when linguistic information and shared concept spaces emerge during training and where alignment breaks down. Although dialects are not yet explicitly modeled in this analysis, the findings provide insight into multilingual representation learning during pre-training. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Optimizing data for multilingual post-training'''&amp;lt;br&amp;gt;In this session we will look into techniques for augmenting data collections for better multilingual coverage. We will discuss the role of translation and inference settings, and explore methods for optimizing multilingual data both on the prompt and the generation side.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Agnes Toftgård, The National Library (Sweden)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Anastasia Philipps, University of Oslo (Norway)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Angelina Zanardi, National Library of Norway&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology (Netherlands)&lt;br /&gt;
# David Salinas, ELLIS institute Tübingen (Germany)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library of Norway&lt;br /&gt;
# Marthe Midtgaard, National Library of Norway&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Rolv-Arild Braaten, National Library of Norway&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library of Norway&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1856</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1856"/>
		<updated>2026-01-30T15:55:22Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school has a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme is comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also includes critical reflections on current development trends in LLM-focused NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium and with reflections about current LLM-oriented activities of the National Library of Norway. &lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS Institute Tübingen&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Laurie Burchell and Pedro Ortiz Suarez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilinguality at Common Crawl: improving language coverage for the largest open web corpus'''&amp;lt;br&amp;gt;The Common Crawl Foundation (CCF) provides the largest open corpus of web data, enabling a wide range of scientific and technical applications including large language model (LLM) development. However, our current data processing pipeline faces challenges when processing multilingual data, decreasing language representation and impacting downstream model performance. In this talk, we will discuss CCF’s initiatives to improve multilingual coverage and language identification of our web corpus. These efforts include soliciting crowd-sourced web seeds for under-served languages, running the First Workshop for Multilingual Data Quality Signals at COLM 2025, and creating CommonLID, a community-driven, human-annotated language identification benchmark for the web domain. Throughout, we emphasise the collaborative nature of our efforts, working in partnership with members of the NLP community to improve content available in their languages.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Multilingual Models'''&amp;lt;br&amp;gt; Large Language Models introduced in the recent years have been found extremely helpful to advance the state-of-the-art in many Natural Language Applications, notably due to their ability to compute numerical, high-dimensional, representations of linguistic units such as words or sentences. Multilingual language models go one step further and add the ability to handle multiple languages, sometimes even multiple scripts, with just one single model. In this presentation, I will discuss multilingual language models at length, with a focus on the evaluation of their multilingual abilities, which raises two difficult questions: (a) to evaluate their performance as if they were just a collection of monolingual models; (b) to evaluate their performance as integrated multilingual models, capable of bridging between languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Generations Multilingually: Current challenges and Lessons from Machine Translation'''&amp;lt;br&amp;gt;In this session we will dive into the particular challenge of evaluating LLMs across many languages in generative tasks. We will take a look at the &amp;quot;sister field&amp;quot; of machine translation and inspect what principles have led to advances in understanding quality across languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Text Generation: Know your Options!'''&amp;lt;br&amp;gt;Text generation, contextual or non-contextual, is ubiquitous in the current LLM era, as it serves as the most basic block in multiple application contexts, from question answering and dialog systems to text summarization and machine translation, and many more. Generation is thus equally useful to compute deterministic and highly non-deterministic mappings with various level of output constraints. Furthermore, text generation is also used as a sub-routine of more complex generation strategies, aiming to produce syntactically well-formed (e.g. for code generation) or semantically consistent outputs, possibility through multiple steps of generation (e.g, in chain-of-thoughts generation) or to collect diverse samples from the generating distribution. To cover this considerable diversity of uses, multiple text generation strategies have been proposed, some less well-known than others. In this talk I will review various families of generation algorithms, from the most basic ones to the more sophisticated approaches, so as to document, as much as possible, the possible options that are available to text generation users. The final part will survey some decoding issues that are specific to multilingual models. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Max Idahl &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilingual model-based quality filtering'''&amp;lt;br&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' David Salinas &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Challenges in Evaluating Generative Models'''&amp;lt;br&amp;gt;In this talk, we will discuss the evaluation of generative models, in particular Large Language Models (LLMs). Given that such models produce open-ended output, their evaluation requires different techniques than static evaluations such as simple question-answering benchmarks. We will first discuss human annotations and their use in leaderboards such as LMArena and ComparIA. We will then focus on automatic evaluation relying on LLM judges. In particular, we will describe current challenges with LLM judges before discussing their application in multilingual settings.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session''': National Library of Norway, OpenEuroLLM, MultiSynt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Barbara Plank &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''NLP Beyond the Standard: Dialects, Variation, and Shared Representations in Multilingual Language Models'''&amp;lt;br&amp;gt;Multilingual language models have primarily focused on cross-lingual differences, with intra-language variation only recently gaining more attention. Dialects and non-standard varieties challenge core assumptions about data, representation, and evaluation. In this talk, I discuss what makes dialects particularly challenging for multilingual models, review approaches starting from early encoder-based methods, and give an overview of resources developed for dialectal NLP, with a focus on German dialects. I then turn to recent work on multilingual training dynamics and shared representations, analyzing when linguistic information and shared concept spaces emerge during training and where alignment breaks down. Although dialects are not yet explicitly modeled in this analysis, the findings provide insight into multilingual representation learning during pre-training. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Optimizing data for multilingual post-training'''&amp;lt;br&amp;gt;In this session we will look into techniques for augmenting data collections for better multilingual coverage. We will discuss the role of translation and inference settings, and explore methods for optimizing multilingual data both on the prompt and the generation side.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, The National Library (Sweden)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Anastasia Philipps, University of Oslo (Norway)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Angelina Zanardi, National Library of Norway&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology (Netherlands)&lt;br /&gt;
# David Salinas, ELLIS institute Tübingen (Germany)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library of Norway&lt;br /&gt;
# Marthe Midtgaard, National Library of Norway&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Rolv-Arild Braaten, National Library of Norway&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library of Norway&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1855</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1855"/>
		<updated>2026-01-30T11:43:30Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school has a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme is comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also includes critical reflections on current development trends in LLM-focused NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium and with reflections about current LLM-oriented activities of the National Library of Norway. &lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS Institute Tübingen&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Laurie Burchell and Pedro Ortiz Suarez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilinguality at Common Crawl: improving language coverage for the largest open web corpus'''&amp;lt;br&amp;gt;The Common Crawl Foundation (CCF) provides the largest open corpus of web data, enabling a wide range of scientific and technical applications including large language model (LLM) development. However, our current data processing pipeline faces challenges when processing multilingual data, decreasing language representation and impacting downstream model performance. In this talk, we will discuss CCF’s initiatives to improve multilingual coverage and language identification of our web corpus. These efforts include soliciting crowd-sourced web seeds for under-served languages, running the First Workshop for Multilingual Data Quality Signals at COLM 2025, and creating CommonLID, a community-driven, human-annotated language identification benchmark for the web domain. Throughout, we emphasise the collaborative nature of our efforts, working in partnership with members of the NLP community to improve content available in their languages.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Multilingual Models'''&amp;lt;br&amp;gt; Large Language Models introduced in the recent years have been found extremely helpful to advance the state-of-the-art in many Natural Language Applications, notably due to their ability to compute numerical, high-dimensional, representations of linguistic units such as words or sentences. Multilingual language models go one step further and add the ability to handle multiple languages, sometimes even multiple scripts, with just one single model. In this presentation, I will discuss multilingual language models at length, with a focus on the evaluation of their multilingual abilities, which raises two difficult questions: (a) to evaluate their performance as if they were just a collection of monolingual models; (b) to evaluate their performance as integrated multilingual models, capable of bridging between languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Generations Multilingually: Current challenges and Lessons from Machine Translation'''&amp;lt;br&amp;gt;In this session we will dive into the particular challenge of evaluating LLMs across many languages in generative tasks. We will take a look at the &amp;quot;sister field&amp;quot; of machine translation and inspect what principles have led to advances in understanding quality across languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Text Generation: Know your Options!'''&amp;lt;br&amp;gt;Text generation, contextual or non-contextual, is ubiquitous in the current LLM era, as it serves as the most basic block in multiple application contexts, from question answering and dialog systems to text summarization and machine translation, and many more. Generation is thus equally useful to compute deterministic and highly non-deterministic mappings with various level of output constraints. Furthermore, text generation is also used as a sub-routine of more complex generation strategies, aiming to produce syntactically well-formed (e.g. for code generation) or semantically consistent outputs, possibility through multiple steps of generation (e.g, in chain-of-thoughts generation) or to collect diverse samples from the generating distribution. To cover this considerable diversity of uses, multiple text generation strategies have been proposed, some less well-known than others. In this talk I will review various families of generation algorithms, from the most basic ones to the more sophisticated approaches, so as to document, as much as possible, the possible options that are available to text generation users. The final part will survey some decoding issues that are specific to multilingual models. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Max Idahl &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilingual model-based quality filtering'''&amp;lt;br&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' David Salinas &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Challenges in Evaluating Generative Models'''&amp;lt;br&amp;gt;In this talk, we will discuss the evaluation of generative models, in particular Large Language Models (LLMs). Given that such models produce open-ended output, their evaluation requires different techniques than static evaluations such as simple question-answering benchmarks. We will first discuss human annotations and their use in leaderboards such as LMArena and ComparIA. We will then focus on automatic evaluation relying on LLM judges. In particular, we will describe current challenges with LLM judges before discussing their application in multilingual settings.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session''': National Library of Norway, OpenEuroLLM, MultiSynt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Optimizing data for multilingual post-training'''&amp;lt;br&amp;gt;In this session we will look into techniques for augmenting data collections for better multilingual coverage. We will discuss the role of translation and inference settings, and explore methods for optimizing multilingual data both on the prompt and the generation side.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Barbara Plank &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''NLP Beyond the Standard: Dialects, Variation, and Shared Representations in Multilingual Language Models'''&amp;lt;br&amp;gt;Multilingual language models have primarily focused on cross-lingual differences, with intra-language variation only recently gaining more attention. Dialects and non-standard varieties challenge core assumptions about data, representation, and evaluation. In this talk, I discuss what makes dialects particularly challenging for multilingual models, review approaches starting from early encoder-based methods, and give an overview of resources developed for dialectal NLP, with a focus on German dialects. I then turn to recent work on multilingual training dynamics and shared representations, analyzing when linguistic information and shared concept spaces emerge during training and where alignment breaks down. Although dialects are not yet explicitly modeled in this analysis, the findings provide insight into multilingual representation learning during pre-training. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, The National Library (Sweden)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Anastasia Philipps, University of Oslo (Norway)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Angelina Zanardi, National Library of Norway&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology (Netherlands)&lt;br /&gt;
# David Salinas, ELLIS institute Tübingen (Germany)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library of Norway&lt;br /&gt;
# Marthe Midtgaard, National Library of Norway&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Rolv-Arild Braaten, National Library of Norway&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library of Norway&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1854</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1854"/>
		<updated>2026-01-29T09:16:05Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school has a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme is comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also includes critical reflections on current development trends in LLM-focused NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium and with reflections about current LLM-oriented activities of the National Library of Norway. &lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS Institute Tübingen&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Laurie Burchell and Pedro Ortiz Suarez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilinguality at Common Crawl: improving language coverage for the largest open web corpus'''&amp;lt;br&amp;gt;The Common Crawl Foundation (CCF) provides the largest open corpus of web data, enabling a wide range of scientific and technical applications including large language model (LLM) development. However, our current data processing pipeline faces challenges when processing multilingual data, decreasing language representation and impacting downstream model performance. In this talk, we will discuss CCF’s initiatives to improve multilingual coverage and language identification of our web corpus. These efforts include soliciting crowd-sourced web seeds for under-served languages, running the First Workshop for Multilingual Data Quality Signals at COLM 2025, and creating CommonLID, a community-driven, human-annotated language identification benchmark for the web domain. Throughout, we emphasise the collaborative nature of our efforts, working in partnership with members of the NLP community to improve content available in their languages.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Text Generation: Know your Options!'''&amp;lt;br&amp;gt;Text generation, contextual or non-contextual, is ubiquitous in the current LLM era, as it serves as the most basic block in multiple application contexts, from question answering and dialog systems to text summarization and machine translation, and many more. Generation is thus equally useful to compute deterministic and highly non-deterministic mappings with various level of output constraints. Furthermore, text generation is also used as a sub-routine of more complex generation strategies, aiming to produce syntactically well-formed (e.g. for code generation) or semantically consistent outputs, possibility through multiple steps of generation (e.g, in chain-of-thoughts generation) or to collect diverse samples from the generating distribution. To cover this considerable diversity of uses, multiple text generation strategies have been proposed, some less well-known than others. In this talk I will review various families of generation algorithms, from the most basic ones to the more sophisticated approaches, so as to document, as much as possible, the possible options that are available to text generation users. The final part will survey some decoding issues that are specific to multilingual models. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Generations Multilingually: Current challenges and Lessons from Machine Translation'''&amp;lt;br&amp;gt;In this session we will dive into the particular challenge of evaluating LLMs across many languages in generative tasks. We will take a look at the &amp;quot;sister field&amp;quot; of machine translation and inspect what principles have led to advances in understanding quality across languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Max Idahl &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilingual model-based quality filtering'''&amp;lt;br&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Multilingual Models'''&amp;lt;br&amp;gt; Large Language Models introduced in the recent years have been found extremely helpful to advance the state-of-the-art in many Natural Language Applications, notably due to their ability to compute numerical, high-dimensional, representations of linguistic units such as words or sentences. Multilingual language models go one step further and add the ability to handle multiple languages, sometimes even multiple scripts, with just one single model. In this presentation, I will discuss multilingual language models at length, with a focus on the evaluation of their multilingual abilities, which raises two difficult questions: (a) to evaluate their performance as if they were just a collection of monolingual models; (b) to evaluate their performance as integrated multilingual models, capable of bridging between languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' David Salinas &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Challenges in Evaluating Generative Models'''&amp;lt;br&amp;gt;In this talk, we will discuss the evaluation of generative models, in particular Large Language Models (LLMs). Given that such models produce open-ended output, their evaluation requires different techniques than static evaluations such as simple question-answering benchmarks. We will first discuss human annotations and their use in leaderboards such as LMArena and ComparIA. We will then focus on automatic evaluation relying on LLM judges. In particular, we will describe current challenges with LLM judges before discussing their application in multilingual settings.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session''': National Library of Norway, OpenEuroLLM, MultiSynt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Optimizing data for multilingual post-training'''&amp;lt;br&amp;gt;In this session we will look into techniques for augmenting data collections for better multilingual coverage. We will discuss the role of translation and inference settings, and explore methods for optimizing multilingual data both on the prompt and the generation side.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Barbara Plank &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''NLP Beyond the Standard: Dialects, Variation, and Shared Representations in Multilingual Language Models'''&amp;lt;br&amp;gt;Multilingual language models have primarily focused on cross-lingual differences, with intra-language variation only recently gaining more attention. Dialects and non-standard varieties challenge core assumptions about data, representation, and evaluation. In this talk, I discuss what makes dialects particularly challenging for multilingual models, review approaches starting from early encoder-based methods, and give an overview of resources developed for dialectal NLP, with a focus on German dialects. I then turn to recent work on multilingual training dynamics and shared representations, analyzing when linguistic information and shared concept spaces emerge during training and where alignment breaks down. Although dialects are not yet explicitly modeled in this analysis, the findings provide insight into multilingual representation learning during pre-training. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, The National Library (Sweden)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Anastasia Philipps, University of Oslo (Norway)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Angelina Zanardi, National Library of Norway&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology (Netherlands)&lt;br /&gt;
# David Salinas, ELLIS institute Tübingen (Germany)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library of Norway&lt;br /&gt;
# Marthe Midtgaard, National Library of Norway&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Rolv-Arild Braaten, National Library of Norway&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library of Norway&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1853</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1853"/>
		<updated>2026-01-28T19:52:14Z</updated>

		<summary type="html">&lt;p&gt;Andreku: Schedule confirmed&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school has a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme is comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also includes critical reflections on current development trends in LLM-focused NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium and with reflections about current LLM-oriented activities of the National Library of Norway. &lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS Institute Tübingen&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Laurie Burchell and Pedro Ortiz Suarez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilinguality at Common Crawl: improving language coverage for the largest open web corpus'''&amp;lt;br&amp;gt;The Common Crawl Foundation (CCF) provides the largest open corpus of web data, enabling a wide range of scientific and technical applications including large language model (LLM) development. However, our current data processing pipeline faces challenges when processing multilingual data, decreasing language representation and impacting downstream model performance. In this talk, we will discuss CCF’s initiatives to improve multilingual coverage and language identification of our web corpus. These efforts include soliciting crowd-sourced web seeds for under-served languages, running the First Workshop for Multilingual Data Quality Signals at COLM 2025, and creating CommonLID, a community-driven, human-annotated language identification benchmark for the web domain. Throughout, we emphasise the collaborative nature of our efforts, working in partnership with members of the NLP community to improve content available in their languages.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Text Generation: Know your Options!'''&amp;lt;br&amp;gt;Text generation, contextual or non-contextual, is ubiquitous in the current LLM era, as it serves as the most basic block in multiple application contexts, from question answering and dialog systems to text summarization and machine translation, and many more. Generation is thus equally useful to compute deterministic and highly non-deterministic mappings with various level of output constraints. Furthermore, text generation is also used as a sub-routine of more complex generation strategies, aiming to produce syntactically well-formed (e.g. for code generation) or semantically consistent outputs, possibility through multiple steps of generation (e.g, in chain-of-thoughts generation) or to collect diverse samples from the generating distribution. To cover this considerable diversity of uses, multiple text generation strategies have been proposed, some less well-known than others. In this talk I will review various families of generation algorithms, from the most basic ones to the more sophisticated approaches, so as to document, as much as possible, the possible options that are available to text generation users. The final part will survey some decoding issues that are specific to multilingual models. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Generations Multilingually: Current challenges and Lessons from Machine Translation'''&amp;lt;br&amp;gt;In this session we will dive into the particular challenge of evaluating LLMs across many languages in generative tasks. We will take a look at the &amp;quot;sister field&amp;quot; of machine translation and inspect what principles have led to advances in understanding quality across languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Max Idahl &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilingual model-based quality filtering'''&amp;lt;br&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Multilingual Models'''&amp;lt;br&amp;gt; Large Language Models introduced in the recent years have been found extremely helpful to advance the state-of-the-art in many Natural Language Applications, notably due to their ability to compute numerical, high-dimensional, representations of linguistic units such as words or sentences. Multilingual language models go one step further and add the ability to handle multiple languages, sometimes even multiple scripts, with just one single model. In this presentation, I will discuss multilingual language models at length, with a focus on the evaluation of their multilingual abilities, which raises two difficult questions: (a) to evaluate their performance as if they were just a collection of monolingual models; (b) to evaluate their performance as integrated multilingual models, capable of bridging between languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' David Salinas &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Challenges in Evaluating Generative Models'''&amp;lt;br&amp;gt;In this talk, we will discuss the evaluation of generative models, in particular Large Language Models (LLMs). Given that such models produce open-ended output, their evaluation requires different techniques than static evaluations such as simple question-answering benchmarks. We will first discuss human annotations and their use in leaderboards such as LMArena and ComparIA. We will then focus on automatic evaluation relying on LLM judges. In particular, we will describe current challenges with LLM judges before discussing their application in multilingual settings.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session''': National Library of Norway, OpenEuroLLM, MultiSynt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Optimizing data for multilingual post-training'''&amp;lt;br&amp;gt;In this session we will look into techniques for augmenting data collections for better multilingual coverage. We will discuss the role of translation and inference settings, and explore methods for optimizing multilingual data both on the prompt and the generation side.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Barbara Plank &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''NLP Beyond the Standard: Dialects, Variation, and Shared Representations in Multilingual Language Models'''&amp;lt;br&amp;gt;Multilingual language models have primarily focused on cross-lingual differences, with intra-language variation only recently gaining more attention. Dialects and non-standard varieties challenge core assumptions about data, representation, and evaluation. In this talk, I discuss what makes dialects particularly challenging for multilingual models, review approaches starting from early encoder-based methods, and give an overview of resources developed for dialectal NLP, with a focus on German dialects. I then turn to recent work on multilingual training dynamics and shared representations, analyzing when linguistic information and shared concept spaces emerge during training and where alignment breaks down. Although dialects are not yet explicitly modeled in this analysis, the findings provide insight into multilingual representation learning during pre-training. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, The National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Anastasia Philipps, University of Oslo (Norway)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Angelina Zanardi, National Library of Norway&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology (Netherlands)&lt;br /&gt;
# David Salinas, ELLIS institute Tübingen (Germany)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library of Norway&lt;br /&gt;
# Marthe Midtgaard, National Library of Norway&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Rolv-Arild Braaten, National Library of Norway&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library of Norway&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1852</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1852"/>
		<updated>2026-01-28T17:27:12Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Preliminary schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS Institute Tübingen&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Laurie Burchell and Pedro Ortiz Suarez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilinguality at Common Crawl: improving language coverage for the largest open web corpus'''&amp;lt;br&amp;gt;The Common Crawl Foundation (CCF) provides the largest open corpus of web data, enabling a wide range of scientific and technical applications including large language model (LLM) development. However, our current data processing pipeline faces challenges when processing multilingual data, decreasing language representation and impacting downstream model performance. In this talk, we will discuss CCF’s initiatives to improve multilingual coverage and language identification of our web corpus. These efforts include soliciting crowd-sourced web seeds for under-served languages, running the First Workshop for Multilingual Data Quality Signals at COLM 2025, and creating CommonLID, a community-driven, human-annotated language identification benchmark for the web domain. Throughout, we emphasise the collaborative nature of our efforts, working in partnership with members of the NLP community to improve content available in their languages.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Text Generation: Know your Options!'''&amp;lt;br&amp;gt;Text generation, contextual or non-contextual, is ubiquitous in the current LLM era, as it serves as the most basic block in multiple application contexts, from question answering and dialog systems to text summarization and machine translation, and many more. Generation is thus equally useful to compute deterministic and highly non-deterministic mappings with various level of output constraints. Furthermore, text generation is also used as a sub-routine of more complex generation strategies, aiming to produce syntactically well-formed (e.g. for code generation) or semantically consistent outputs, possibility through multiple steps of generation (e.g, in chain-of-thoughts generation) or to collect diverse samples from the generating distribution. To cover this considerable diversity of uses, multiple text generation strategies have been proposed, some less well-known than others. In this talk I will review various families of generation algorithms, from the most basic ones to the more sophisticated approaches, so as to document, as much as possible, the possible options that are available to text generation users. The final part will survey some decoding issues that are specific to multilingual models. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Generations Multilingually: Current challenges and Lessons from Machine Translation'''&amp;lt;br&amp;gt;In this session we will dive into the particular challenge of evaluating LLMs across many languages in generative tasks. We will take a look at the &amp;quot;sister field&amp;quot; of machine translation and inspect what principles have led to advances in understanding quality across languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Max Idahl &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilingual model-based quality filtering'''&amp;lt;br&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Multilingual Models'''&amp;lt;br&amp;gt; Large Language Models introduced in the recent years have been found extremely helpful to advance the state-of-the-art in many Natural Language Applications, notably due to their ability to compute numerical, high-dimensional, representations of linguistic units such as words or sentences. Multilingual language models go one step further and add the ability to handle multiple languages, sometimes even multiple scripts, with just one single model. In this presentation, I will discuss multilingual language models at length, with a focus on the evaluation of their multilingual abilities, which raises two difficult questions: (a) to evaluate their performance as if they were just a collection of monolingual models; (b) to evaluate their performance as integrated multilingual models, capable of bridging between languages. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' David Salinas &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Challenges in Evaluating Generative Models'''&amp;lt;br&amp;gt;In this talk, we will discuss the evaluation of generative models, in particular Large Language Models (LLMs). Given that such models produce open-ended output, their evaluation requires different techniques than static evaluations such as simple question-answering benchmarks. We will first discuss human annotations and their use in leaderboards such as LMArena and ComparIA. We will then focus on automatic evaluation relying on LLM judges. In particular, we will describe current challenges with LLM judges before discussing their application in multilingual settings.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session''': National Library of Norway, OpenEuroLLM, MultiSynt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Optimizing data for multilingual post-training'''&amp;lt;br&amp;gt;In this session we will look into techniques for augmenting data collections for better multilingual coverage. We will discuss the role of translation and inference settings, and explore methods for optimizing multilingual data both on the prompt and the generation side.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Barbara Plank &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''NLP Beyond the Standard: Dialects, Variation, and Shared Representations in Multilingual Language Models'''&amp;lt;br&amp;gt;Multilingual language models have primarily focused on cross-lingual differences, with intra-language variation only recently gaining more attention. Dialects and non-standard varieties challenge core assumptions about data, representation, and evaluation. In this talk, I discuss what makes dialects particularly challenging for multilingual models, review approaches starting from early encoder-based methods, and give an overview of resources developed for dialectal NLP, with a focus on German dialects. I then turn to recent work on multilingual training dynamics and shared representations, analyzing when linguistic information and shared concept spaces emerge during training and where alignment breaks down. Although dialects are not yet explicitly modeled in this analysis, the findings provide insight into multilingual representation learning during pre-training. &amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, The National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Anastasia Philipps, University of Oslo (Norway)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Angelina Zanardi, National Library of Norway&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology (Netherlands)&lt;br /&gt;
# David Salinas, ELLIS institute Tübingen (Germany)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library of Norway&lt;br /&gt;
# Marthe Midtgaard, National Library of Norway&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Rolv-Arild Braaten, National Library of Norway&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library of Norway&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1851</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1851"/>
		<updated>2026-01-28T17:21:32Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Preliminary schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS Institute Tübingen&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Laurie Burchell and Pedro Ortiz Suarez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilinguality at Common Crawl: improving language coverage for the largest open web corpus'''&amp;lt;br&amp;gt;Abstract&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Text Generation: Know your Options!'''&amp;lt;br&amp;gt;Abstract&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Generations Multilingually: Current challenges and Lessons from Machine Translation'''&amp;lt;br&amp;gt;Abstract&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Max Idahl &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Multilingual model-based quality filtering'''&amp;lt;br&amp;gt;Abstract&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' François Yvon &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Evaluating Multilingual Models'''&amp;lt;br&amp;gt;Abstract&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' David Salinas &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Challenges in Evaluating Generative Models'''&amp;lt;br&amp;gt;Abstract&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session''': National Library of Norway, OpenEuroLLM, MultiSynt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Julia Kreutzer &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Optimizing data for multilingual post-training'''&amp;lt;br&amp;gt;Abstract&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Barbara Plank &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''NLP Beyond the Standard: Dialects, Variation, and Shared Representations in Multilingual Language Models'''&amp;lt;br&amp;gt;Abstract&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, The National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Anastasia Philipps, University of Oslo (Norway)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Angelina Zanardi, National Library of Norway&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology (Netherlands)&lt;br /&gt;
# David Salinas, ELLIS institute Tübingen (Germany)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library of Norway&lt;br /&gt;
# Marthe Midtgaard, National Library of Norway&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Rolv-Arild Braaten, National Library of Norway&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library of Norway&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1850</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1850"/>
		<updated>2026-01-23T11:21:05Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS Institute Tübingen&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, The National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Anastasia Philipps, University of Oslo (Norway)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Angelina Zanardi, National Library of Norway&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology (Netherlands)&lt;br /&gt;
# David Salinas, ELLIS institute Tübingen (Germany)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library of Norway&lt;br /&gt;
# Marthe Midtgaard, National Library of Norway&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Rolv-Arild Braaten, National Library of Norway&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library of Norway&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1849</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1849"/>
		<updated>2026-01-08T14:26:06Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS Institute Tübingen&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, The National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Angelina Zanardi, National Library of Norway&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology (Netherlands)&lt;br /&gt;
# David Salinas, ELLIS institute Tübingen (Germany)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library of Norway&lt;br /&gt;
# Marthe Midtgaard, National Library of Norway&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Rolv-Arild Braaten, National Library of Norway&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library of Norway&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1843</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1843"/>
		<updated>2025-12-23T18:09:17Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS institute Tübingen / University of Freiburg&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# David Salinas, ELLIS institute Tübingen (Germany)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1842</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1842"/>
		<updated>2025-12-23T17:28:04Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Programme */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://geoalgo.github.io/ David Salinas], ELLIS institute Tübingen / University of Freiburg&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1841</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1841"/>
		<updated>2025-12-21T11:00:40Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1840</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1840"/>
		<updated>2025-12-20T15:20:37Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Registration */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–70 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is now closed.&lt;br /&gt;
Requests for participation were processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form were confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Roberts Darģis, University of Latvia (Latvia)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1839</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1839"/>
		<updated>2025-12-19T23:33:39Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–80 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is open.&lt;br /&gt;
[https://nettskjema.no/a/381438 Requests for participation] will be processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form will be confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which is also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel. [https://sites.google.com/view/sogstikollen-24f &amp;lt;span style=&amp;quot;colour: white;&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Etienne Simon, University of Oslo (Norway)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Hannan Mahadik, ELLIS Institute Tübingen (Germany)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Maja Buljan, University of Oslo (Norway)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Roberts Darģis, University of Latvia (Latvia)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1838</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1838"/>
		<updated>2025-12-18T12:23:01Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–80 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is open.&lt;br /&gt;
[https://nettskjema.no/a/381438 Requests for participation] will be processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form will be confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which is also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel. [https://sites.google.com/view/sogstikollen-24f &amp;lt;span style=&amp;quot;colour: white;&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Roberts Darģis, University of Latvia (Latvia)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1837</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1837"/>
		<updated>2025-12-17T16:25:08Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–80 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is open.&lt;br /&gt;
[https://nettskjema.no/a/381438 Requests for participation] will be processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form will be confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which is also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel. [https://sites.google.com/view/sogstikollen-24f &amp;lt;span style=&amp;quot;colour: white;&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Ankit Sonthalia, Tübingen AI Center (Germany)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Roberts Darģis, University of Latvia (Latvia)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1836</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1836"/>
		<updated>2025-12-16T12:44:30Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–80 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is open.&lt;br /&gt;
[https://nettskjema.no/a/381438 Requests for participation] will be processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form will be confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which is also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel. [https://sites.google.com/view/sogstikollen-24f &amp;lt;span style=&amp;quot;colour: white;&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Ankit Sonthalia, Tübingen AI Center (Germany)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Konstantin Dobler, Hasso Plattner Institute (Germany)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Roberts Darģis, University of Latvia (Latvia)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1835</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1835"/>
		<updated>2025-12-15T20:02:44Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–80 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is open.&lt;br /&gt;
[https://nettskjema.no/a/381438 Requests for participation] will be processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form will be confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which is also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel. [https://sites.google.com/view/sogstikollen-24f &amp;lt;span style=&amp;quot;colour: white;&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Ankit Sonthalia, Tübingen AI Center (Germany)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fedor Vitiugin, University of Turku (Finland)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Ghulam Muhammed Khan, University of Exeter (United Kingdom)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Katarina Strani Herriot-Watt University (United Kingdom)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Konstantin Dobler, Hasso Plattner Institute (Germany)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Manisha Venkat, University of Essex (UK)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Roberts Darģis, University of Latvia (Latvia)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1832</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1832"/>
		<updated>2025-12-06T19:30:44Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–80 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is open.&lt;br /&gt;
[https://nettskjema.no/a/381438 Requests for participation] will be processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form will be confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which is also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel. [https://sites.google.com/view/sogstikollen-24f &amp;lt;span style=&amp;quot;colour: white;&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Ankit Sonthalia, Tübingen AI Center (Germany)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, University of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Konstantin Dobler, Hasso Plattner Institute (Germany)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Manisha Venkat, University of Essex (UK)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nalin Kumar, Charles University (Czech Republic)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Sienna (Italy)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Roberts Darģis, University of Latvia (Latvia)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1831</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1831"/>
		<updated>2025-12-06T19:27:32Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–80 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is open.&lt;br /&gt;
[https://nettskjema.no/a/381438 Requests for participation] will be processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form will be confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which is also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel. [https://sites.google.com/view/sogstikollen-24f &amp;lt;span style=&amp;quot;colour: white;&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Agnes Toftgård, National Library (Sweden)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Ankit Sonthalia, Tübingen AI Center (Germany)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, univervisity of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Konstantin Dobler, Hasso Plattner Institute (Germany)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Manisha Venkat, University of Essex (UK)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nalin Kumar, Charles University (Czech Republic)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Siena (Italy)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Roberts Darģis, University of Latvia (Latvia)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1830</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1830"/>
		<updated>2025-12-06T19:11:22Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
* [https://www.isir.upmc.fr/personnel/yvon/?lang=en François Yvon], Sorbonne Université&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–80 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is open.&lt;br /&gt;
[https://nettskjema.no/a/381438 Requests for participation] will be processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form will be confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which is also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel. [https://sites.google.com/view/sogstikollen-24f &amp;lt;span style=&amp;quot;colour: white;&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Adam Hrin, AMD Silo AI (Finland)&lt;br /&gt;
# Aitor Soroa, University of the Basque Country (Spain)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Ankit Sonthalia, Tübingen AI Center (Germany)&lt;br /&gt;
# Anni Moisala,	CSC – IT Center for Science (Finland)&lt;br /&gt;
# Artūrs Znotiņš, University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, Ludwig-Maximilians-Universität München (Germany)&lt;br /&gt;
# Charlotte Noel, LINAGORA Labs (France)&lt;br /&gt;
# Dalton Harmsen, Eindhoven University of Technology, OpenEuroLLM (Netherlands)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# François Yvon, CNRS (France)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (Luxembourg)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne Université (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, univervisity of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Jouni Luoma, AMD Silo AI (Finland)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Konstantin Dobler, Hasso Plattner Institute (Germany)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Manisha Venkat, University of Essex (UK)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Meihan Tong, University of Oslo (Norway)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nalin Kumar, Charles University (Czech Republic)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Siena (Italy)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Roberts Darģis, University of Latvia (Latvia)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Sampo Pyysalo, University of Turku (Finland)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Université (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1826</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1826"/>
		<updated>2025-12-02T01:13:47Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–80 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is open.&lt;br /&gt;
[https://nettskjema.no/a/381438 Requests for participation] will be processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form will be confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which is also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel. [https://sites.google.com/view/sogstikollen-24f &amp;lt;span style=&amp;quot;colour: white;&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
# Aitor Soroa, Hitz research center / University of the Basque Country (Spain)&lt;br /&gt;
# Aleksandra Krasnodębska, NASK PIB (Poland)&lt;br /&gt;
# Alicia Núñez Alcover, Prompsit (Spain)&lt;br /&gt;
# Aman Sinha, Université de Lorraine (France)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Ankit Sonthalia, Tuebingen AI Center (Germany)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Barbara Heinisch, Eurac Research (Italy)&lt;br /&gt;
# Barbara Plank, LMU Munich (Germany)&lt;br /&gt;
# Charlotte Noel, IRIT / LINAGORA (France)&lt;br /&gt;
# Diana Kylymnyk, University of Exeter (UK)&lt;br /&gt;
# Elizaveta Kuzmenko, Université Libre de Bruxelles (Belgium)&lt;br /&gt;
# Faton Rekathati, The National Library (Sweden)&lt;br /&gt;
# Fred Philippy, University of Luxembourg (SnT) (Luxembourg)&lt;br /&gt;
# Gianluca Barmina, University of Southern Denmark, Danish Foundation Models (Denmark)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Iglika Nikolova-Stoupak, Sorbonne University (France)&lt;br /&gt;
# Jan Hajič, Charles University ((Czech Republic)&lt;br /&gt;
# Jiajing Wan, univervisity of Bergen (Norway)&lt;br /&gt;
# Jindřich Helcl, University of Oslo (Norway)&lt;br /&gt;
# Johannes Gabriel Sindlinger, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Julia Kreutzer, Cohere Labs (Canada)&lt;br /&gt;
# Justyna Sikora, The National Library (Sweden)&lt;br /&gt;
# Kevin Glocker, Linköping University (Sweden)&lt;br /&gt;
# Konstantin Dobler, Hasso Plattner Institute (Germany)&lt;br /&gt;
# Kristýna Onderková, Charles University (Czech Republic)&lt;br /&gt;
# Laurie Burchell, Common Crawl Foundation (UK)&lt;br /&gt;
# Laurène Cave, Sorbonne Université (France)&lt;br /&gt;
# Lisa Yankovskaya, University of Tartu (Estonia)&lt;br /&gt;
# Manisha Venkat, University of Essex (UK)&lt;br /&gt;
# Markus Heiervang, National Library (Norway)&lt;br /&gt;
# Mattes Ruckdeschel, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Maximilian Idahl, ellamind (Germany)&lt;br /&gt;
# Muhammad Imran, University of A Coruña (Spain)&lt;br /&gt;
# Nalin Kumar, Charles University (Czech Republic)&lt;br /&gt;
# Nam Luu, Charles University (Czech Republic)&lt;br /&gt;
# Neda Jamshidi, University of Siena (Italy)&lt;br /&gt;
# Nils Grünefeld, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (USA)&lt;br /&gt;
# Roberts Darģis, University of Latvia (Latvia)&lt;br /&gt;
# Romina Oji, Linköping University (Sweden)&lt;br /&gt;
# Shanshan Xu, University of Copenhagen (Denmark)&lt;br /&gt;
# Shenbin Qian, University of Oslo (Norway)&lt;br /&gt;
# Shibingfeng Zhang, University of Bologna (Italy)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Taja Kuzman Pungeršek, Jožef Stefan Institute (Slovenia)&lt;br /&gt;
# Tommaso Green, University of Mannheim (Germany)&lt;br /&gt;
# Tudor Nicolae Mateiu, Prompsit (Spain)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Wafa Aissa, UCLouvain (Belgium)&lt;br /&gt;
# Xiaorui Yu, King's College London (UK)&lt;br /&gt;
# Yihang Lu, Sorbonne Universite (France)&lt;br /&gt;
# Yiheng Wu, University of Helsinki (Finland)&lt;br /&gt;
# Yves Scherrer, University of Oslo (Norway)&lt;br /&gt;
# Zihao Li, University of Helsinki (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1824</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1824"/>
		<updated>2025-11-19T21:49:37Z</updated>

		<summary type="html">&lt;p&gt;Andreku: 2026 updates&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Circle U, NLPL, &amp;amp; OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation''' =&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
In 2026, the NLPL network and Digital Europe&lt;br /&gt;
project ''[https://openeurollm.eu OpenEuroLLM]''&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2026 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.eurohpc-ju.europa.eu/supercomputers/our-supercomputers_en consortium]&lt;br /&gt;
and is endorsed as a doctoral training event in the European&lt;br /&gt;
[https://www.circle-u.eu Circle U university alliance].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2025 2025]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2026, NLPL will hold its winter school from Monday, February 2, to&lt;br /&gt;
Wednesday, February 4, 2026, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the main Oslo&lt;br /&gt;
airport ''Gardermoen'' (OSL), leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the OpenEuroLLM project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3885 (NOK 3485 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2026 winter school will have a thematic focus on ''Multilinguality in LLM Development and Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by international experts, with special emphasis on open science and European languages, but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example EuroHPC experience&lt;br /&gt;
reports from the OpenEuroLLM consortium.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include (more can be added later):&lt;br /&gt;
&lt;br /&gt;
* [https://bplank.github.io Barbara Plank], Ludwig Maximilian University of Munich&lt;br /&gt;
* [https://commoncrawl.org/team/laurie-burchell Laurie Burchell] and [https://commoncrawl.org/team/pedro-ortiz-suarez Pedro Ortiz Suarez], Common Crawl&lt;br /&gt;
* [https://www.linkedin.com/in/maximilianidahl/?originalSubdomain=de Max Idahl], ellamind&lt;br /&gt;
* [https://juliakreutzer.github.io Julia Kreutzer], Cohere for Labs&lt;br /&gt;
&lt;br /&gt;
= Preliminary schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 2, 2026&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1'''&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' &lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 3, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4'''&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5'''&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6'''&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from OpenEuroLLM'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 4, 2026&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8'''&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9'''&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, we expect 60–80 participants at the 2026 winter school.&lt;br /&gt;
Registration for interested participants is open.&lt;br /&gt;
[https://nettskjema.no/a/381438 Requests for participation] will be processed on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who have submitted the registration form will be confirmed in three batches, on '''November 28''', on '''December 5''',&lt;br /&gt;
and on '''December 19''', which is also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel. [https://sites.google.com/view/sogstikollen-24f &amp;lt;span style=&amp;quot;colour: white;&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the OpenEuroLLM shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 2.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://www.avinor.no/siteassets/flyplasser/oslo-lufthavn/info/kart-over-flyplassen/kart-over-flyplassen-ankomst-oslo-lufthavn-avinor.jpg OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about two-three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 4, before the group returns&lt;br /&gt;
to OSL airport on the OpenEuroLLM shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2026 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the OpenEuroLLM, Circle U, and&lt;br /&gt;
NLPL networks and beyond, please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;nlpl-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of (in alphabetical order):&lt;br /&gt;
&lt;br /&gt;
* Jenia Jitsev (Forschungszentrum Jülich, Germany)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Alessandro Lenci (University of Pisa, Italy)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* David Salinas (ELLIS Institute, Germany)&lt;br /&gt;
* Gema Ramirez-Sanches (Prompsit Language Engineering, Spain)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
* Joaquin Vanschoren (Eindhoven University of Technology, The Netherlands)&lt;br /&gt;
* Guillaume Wisniewski (Paris Cité University, France)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Eosc/easybuild/modules&amp;diff=1820</id>
		<title>Eosc/easybuild/modules</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Eosc/easybuild/modules&amp;diff=1820"/>
		<updated>2025-11-05T13:14:13Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* &amp;quot;Bundle&amp;quot; modules */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= NLPL virtual laboratory =&lt;br /&gt;
&lt;br /&gt;
The laboratory is a reproducible custom-built set of NLP software. &lt;br /&gt;
It is currently installed on ''Saga'' and ''Fox'' HPC clusters.&lt;br /&gt;
&lt;br /&gt;
- To use on ''Saga'': run the following command (can be put in the ''~/.bashrc'' file to be run automatically at login):&lt;br /&gt;
&lt;br /&gt;
''module use -a /cluster/shared/nlpl/software/eb/etc/all/''&lt;br /&gt;
&lt;br /&gt;
- To use on ''Fox'': run the following command (can be put in the ''~/.bashrc'' file to be run automatically at login):&lt;br /&gt;
&lt;br /&gt;
''module use -a /fp/projects01/ec30/software/easybuild/modules/all/''&lt;br /&gt;
&lt;br /&gt;
After that, the &amp;quot;nlpl&amp;quot;-branded modules will be available via ''module avail'', ''module load'', etc. See all the NLPL modules with the ''module avail nlpl'' command.&lt;br /&gt;
&lt;br /&gt;
It is highly recommended to use them, instead of installing a copy in one's own home directory.&lt;br /&gt;
&lt;br /&gt;
== List of modules ==&lt;br /&gt;
From time to time, updated modules with newer software versions will be added, &lt;br /&gt;
but the older modules will never be removed (for reproducibility).&lt;br /&gt;
&lt;br /&gt;
Note that the modules which have &amp;quot;gomkl&amp;quot; or &amp;quot;intel&amp;quot; in their names are built using&lt;br /&gt;
Intel Math Kernel Library, making them (somewhat) faster in CPU tasks&lt;br /&gt;
with Intel processors (for example, on ''Saga'', '''except ''a100'' partition''' which uses AMD CPUs). &lt;br /&gt;
&lt;br /&gt;
Those with &amp;quot;foss&amp;quot; in their names are CPU-agnostic and will run on machines with all kinds of CPUs (for example, on ''Fox'', but also on all ''Saga'' partitions). These are the default ones.&lt;br /&gt;
&lt;br /&gt;
The next element in the module name after &amp;quot;foss&amp;quot;, &amp;quot;gomkl&amp;quot; or &amp;quot;intel&amp;quot; is the virtual laboratory (stack) version: for example, &amp;quot;2021a&amp;quot;, &amp;quot;2022b&amp;quot; or &amp;quot;2024a&amp;quot;. &lt;br /&gt;
Modules from different stack versions are incompatible  with each other: you cannot load a module from &amp;quot;foss-2021a&amp;quot; and a module from &amp;quot;foss-2024a&amp;quot; simultaneously.&lt;br /&gt;
'''Currently, &amp;quot;2024a&amp;quot; version is the recommended one, with the most up-to-date software built with the most up-to-date compilers.'''&lt;br /&gt;
&lt;br /&gt;
Further on, we just use the placeholder '''ARCH''', replace it with &amp;quot;gomkl&amp;quot;, &amp;quot;intel&amp;quot; or &amp;quot;foss&amp;quot; and &amp;quot;2022b&amp;quot; or &amp;quot;2024a&amp;quot;, depending on which machine you are working on and what stack version you want to use.&lt;br /&gt;
Some modules also have the Python version specified in their names (for example, &amp;quot;''nlpl-transformers/4.55.4-foss-2024a-Python-3.12.3''&amp;quot;).&lt;br /&gt;
For stack version &amp;quot;2022b&amp;quot; it is usually Python 3.10.8, for stack version &amp;quot;2024a&amp;quot; it is usually Python 3.12.3.&lt;br /&gt;
Check the output of the '''module avail nlpl''' command for the exact module names.&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;Bundle&amp;quot; modules ===&lt;br /&gt;
These are the modules with the most cryptic names. Each of them contains a bunch of software pieces (Python packages, as a rule).&lt;br /&gt;
Many of these modules will be loaded automatically as dependencies of the regular modules, but sometimes they can be useful themselves.&lt;br /&gt;
They have their own bundle versions: &amp;quot;2022.01&amp;quot; or simply &amp;quot;01&amp;quot;, etc (further specified as '''VERS''').&lt;br /&gt;
Here are the details:&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-python-candy/VERS-ARCH''': various utility packages not directly related to NLP&lt;br /&gt;
** tqdm&lt;br /&gt;
** pydot&lt;br /&gt;
** smart_open&lt;br /&gt;
** cached-property&lt;br /&gt;
** filelock&lt;br /&gt;
** termcolor&lt;br /&gt;
** regex&lt;br /&gt;
** sacremoses&lt;br /&gt;
** mpi4py&lt;br /&gt;
** jsonlines&lt;br /&gt;
** jsonschema&lt;br /&gt;
** typing_extensions&lt;br /&gt;
** packaging&lt;br /&gt;
** termcolor&lt;br /&gt;
** pyhocon&lt;br /&gt;
** blis&lt;br /&gt;
** pathspec&lt;br /&gt;
** hatchling&lt;br /&gt;
** multidict&lt;br /&gt;
** yarl&lt;br /&gt;
** black&lt;br /&gt;
** click&lt;br /&gt;
** plotly&lt;br /&gt;
** toolz&lt;br /&gt;
** msgspec&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-nlptools/VERS-ARCH''': various utility packages related to NLP&lt;br /&gt;
** evaluate&lt;br /&gt;
** conllu&lt;br /&gt;
** seqeval&lt;br /&gt;
** langdetect&lt;br /&gt;
** levenshtein&lt;br /&gt;
** rouge_score&lt;br /&gt;
** sacrebleu&lt;br /&gt;
** udapi&lt;br /&gt;
** word2number&lt;br /&gt;
** ufal.chu-liu-edmonds&lt;br /&gt;
** gensim&lt;br /&gt;
** fastText&lt;br /&gt;
* '''nlpl-scipy-ecosystem/VERS-ARCH''': everything that constitutes the [https://scipy.org/ SciPy ecosystem]. Too many packages to enumerate them all, but the most important are:&lt;br /&gt;
** scipy&lt;br /&gt;
** pandas&lt;br /&gt;
** matplotlib&lt;br /&gt;
** ipython&lt;br /&gt;
** jupyter_core&lt;br /&gt;
** jupyter_client&lt;br /&gt;
** networkx&lt;br /&gt;
** sympy&lt;br /&gt;
** beautifulsoup4&lt;br /&gt;
** numexpr&lt;br /&gt;
** einops&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-llmtools/VERS-ARCH''': various utility packages for working with large language models (LLMs)&lt;br /&gt;
** peft ([https://pypi.org/project/peft/ HuggingFace PEFT]: State-of-the-art Parameter-Efficient Fine-Tuning)&lt;br /&gt;
** promptsource ([https://pypi.org/project/promptsource/ Toolkit for creating, sharing and using natural language prompts])&lt;br /&gt;
** lm-evaluation-harness ([https://github.com/EleutherAI/lm-evaluation-harness EleutherAI Language Model Evaluation Harness])&lt;br /&gt;
** bert_score: [https://pypi.org/project/bert-score/ BERTScore] to evaluate NLG tasks&lt;br /&gt;
** llguidance ([https://github.com/microsoft/llguidance Low-level Guidance]: constrained decoding for LLMs)&lt;br /&gt;
** mistral_common ([https://pypi.org/project/mistral_common/ Mistral-common]: common utilities for Mistral AI)&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-torch-audio-vision/VERS-ARCH''': multimodal extensions for PyTorch&lt;br /&gt;
** torch-vision ([https://github.com/pytorch/vision torchvision]: image and video datasets and models for PyTorch deep learning)&lt;br /&gt;
** torch-audio ([https://github.com/pytorch/audio torchaudio]: an audio library for PyTorch)&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;Regular&amp;quot; modules ===&lt;br /&gt;
These are more obvious modules, each one gives you one software piece:&lt;br /&gt;
&lt;br /&gt;
==== Most important ====&lt;br /&gt;
* '''nlpl-pytorch/1.6.0-ARCH-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.6.0 (for CUDA 10)&lt;br /&gt;
* '''nlpl-pytorch/1.7.1-ARCH-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 10)&lt;br /&gt;
* '''nlpl-pytorch/1.7.1-ARCH-cuda-11.1.1-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 11)&lt;br /&gt;
* '''nlpl-pytorch/1.11.0-ARCH-cuda-11.3.1-Python-3.9.5''': [https://pytorch.org/ PyTorch] 1.11.0 (for CUDA 11)&lt;br /&gt;
* '''nlpl-pytorch/2.1.2-ARCH-cuda-12.0.0-Python-3.10.8''': [https://pytorch.org/ PyTorch] 2.1.2 (for CUDA 12)&lt;br /&gt;
* '''nlpl-pytorch/2.6.0-ARCH-cuda-12.6.0-Python-3.12.3''': [https://pytorch.org/ PyTorch] 2.6.0 (for CUDA 12.6)&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-tensorflow/1.15.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 1.15.2 (for CUDA 10)&lt;br /&gt;
* '''nlpl-tensorflow/2.3.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.3.2 (for CUDA 10)&lt;br /&gt;
* '''nlpl-tensorflow/2.6.2-ARCH-cuda-11.1.1-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.6.2 (for CUDA 11)&lt;br /&gt;
* '''nlpl-tensorflow/2.6.5-ARCH-cuda-11.3.1-Python-3.9.5''': [https://www.tensorflow.org/ TensorFlow] 2.6.5 (for CUDA 11)&lt;br /&gt;
* '''nlpl-tensorflow/2.15.0-ARCH-cuda-12.0.0-Python-3.10.8''': [https://www.tensorflow.org/ TensorFlow] 2.15.0 (for CUDA 12)&lt;br /&gt;
* '''nlpl-tensorflow/2.18.1-ARCH-cuda-12.6.0-Python-3.12.3''': [https://www.tensorflow.org/ TensorFlow] 2.18.1 (for CUDA 12.6)&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-accelerate/0.13.2-ARCH-Python-3.9.5''': [https://pypi.org/project/accelerate/ Accelerate] 0.13.2&lt;br /&gt;
* '''nlpl-accelerate/0.27.2-ARCH-Python-3.10.8''': [https://pypi.org/project/accelerate/ Accelerate] 0.27.2&lt;br /&gt;
* '''nlpl-accelerate/1.9.0-ARCH-Python-3.12.3''': [https://pypi.org/project/accelerate/ Accelerate] 1.9.0&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-transformers/VERS-ARCH''': [https://huggingface.co/transformers/ HuggingFace Transformers]&lt;br /&gt;
* '''nlpl-vllm/VERS-ARCH''': [https://github.com/vllm-project/vllm vLLM] (also includes ''flash-attention'', ''xformers'', ''openai'', ''Ray'')&lt;br /&gt;
&lt;br /&gt;
==== Others ====&lt;br /&gt;
* '''nlpl-bitsandbytes/VERS-ARCH''': [https://pypi.org/project/bitsandbytes/ BitsAndBytes]&lt;br /&gt;
* '''nlpl-bm25s/VERS-ARCH''': [https://github.com/xhluca/bm25s BM25S]&lt;br /&gt;
* '''nlpl-cython/VERS-ARCH''': [http://cython.org/ Cython]&lt;br /&gt;
* '''nlpl-datasets/VERS-ARCH''': [https://github.com/huggingface/datasets HuggingFace Datasets]&lt;br /&gt;
* '''nlpl-gensim/VERS-ARCH''': [https://github.com/RaRe-Technologies/gensim Gensim]&lt;br /&gt;
* '''nlpl-horovod/VERS-ARCH''': [https://github.com/horovod/horovod Horovod]&lt;br /&gt;
* '''nlpl-huggingface-hub/VERS-ARCH''': [https://pypi.org/project/huggingface-hub/ HuggingFace Hub]&lt;br /&gt;
* '''nlpl-nltk/VERS-ARCH''': [https://www.nltk.org/ NLTK], together with '''all''' the corpora and datasets (no need to download them separately!)&lt;br /&gt;
* '''nlpl-numpy/VERS-ARCH''': [https://numpy.org/ NumPy]&lt;br /&gt;
* '''nlpl-pytorch-lightning/VERS-ARCH''': [https://www.pytorchlightning.ai/ PyTorch Lightning] &lt;br /&gt;
* '''nlpl-scikit-bundle/VERS-ARCH''': [https://scikit-learn.org/ Scikit-Learn]&lt;br /&gt;
* '''nlpl-sentencepiece/VERS-ARCH''': [https://github.com/google/sentencepiece SentencePiece]&lt;br /&gt;
* '''nlpl-sentence-transformers/VERS-ARCH''': [https://sbert.net SentenceTransformers]&lt;br /&gt;
* '''nlpl-simple_elmo/VERS-ARCH''': [https://pypi.org/project/simple-elmo/ Simple_elmo]&lt;br /&gt;
* '''nlpl-stanza/VERS-ARCH''': [https://stanfordnlp.github.io/stanza/ Stanza]&lt;br /&gt;
* '''nlpl-tensorboard/VERS-ARCH''': [https://github.com/tensorflow/tensorboard TensorBoard]&lt;br /&gt;
* '''nlpl-tokenizers/VERS-ARCH''': [https://github.com/huggingface/tokenizers HuggingFace Tokenizers]&lt;br /&gt;
* '''nlpl-torch-geometric/VERS-ARCH''': [https://pyg.org/ PyTorch Geometric]&lt;br /&gt;
* '''nlpl-torchmetrics/VERS-ARCH''': [https://pypi.org/project/torchmetrics/ TorchMetrics]&lt;br /&gt;
* '''nlpl-trl/VERS-ARCH''': [https://huggingface.co/docs/trl/index HuggingFace Transformer Reinforcement Learning]&lt;br /&gt;
* '''nlpl-wandb/VERS-ARCH''': [https://pypi.org/project/wandb/ Weights and Biases (wandb)]&lt;br /&gt;
* '''nlpl-warc2text/VERS-ARCH''': [https://github.com/bitextor/warc2text warc2text]&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-torchtext/VERS-ARCH''': [https://pypi.org/project/torchtext/ TorchText] ''(not maintained any more, don't expect much)''&lt;br /&gt;
* '''nlpl-dllogger/0.1.0-ARCH-2019b-Python-3.7.4''': [https://github.com/NVIDIA/dllogger DLLogger] 0.1.0 ''(status unclear)''&lt;br /&gt;
* '''nlpl-nvidia-bert/20.06.8-ARCH-2019b-tensorflow-1.15.2-Python-3.7.4''': [https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT NVIDIA's BERT implementation] for TensorFlow 1 ''(status unclear)''&lt;br /&gt;
&lt;br /&gt;
= Source =&lt;br /&gt;
&lt;br /&gt;
Currently, the virtual laboratory is generated using EasyBuild, all the code and easyconfigs available [https://source.coderefinery.org/nlpl/easybuild in this repository].&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Eosc/easybuild/modules&amp;diff=1804</id>
		<title>Eosc/easybuild/modules</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Eosc/easybuild/modules&amp;diff=1804"/>
		<updated>2025-10-27T11:22:13Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* &amp;quot;Regular&amp;quot; modules */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= NLPL virtual laboratory =&lt;br /&gt;
&lt;br /&gt;
The laboratory is a reproducible custom-built set of NLP software. &lt;br /&gt;
It is currently installed on ''Saga'' and ''Fox'' HPC clusters.&lt;br /&gt;
&lt;br /&gt;
- To use on ''Saga'': run the following command (can be put in the ''~/.bashrc'' file to be run automatically at login):&lt;br /&gt;
&lt;br /&gt;
''module use -a /cluster/shared/nlpl/software/eb/etc/all/''&lt;br /&gt;
&lt;br /&gt;
- To use on ''Fox'': run the following command (can be put in the ''~/.bashrc'' file to be run automatically at login):&lt;br /&gt;
&lt;br /&gt;
''module use -a /fp/projects01/ec30/software/easybuild/modules/all/''&lt;br /&gt;
&lt;br /&gt;
After that, the &amp;quot;nlpl&amp;quot;-branded modules will be available via ''module avail'', ''module load'', etc. See all the NLPL modules with the ''module avail nlpl'' command.&lt;br /&gt;
&lt;br /&gt;
It is highly recommended to use them, instead of installing a copy in one's own home directory.&lt;br /&gt;
&lt;br /&gt;
== List of modules ==&lt;br /&gt;
From time to time, updated modules with newer software versions will be added, &lt;br /&gt;
but the older modules will never be removed (for reproducibility).&lt;br /&gt;
&lt;br /&gt;
Note that the modules which have &amp;quot;gomkl&amp;quot; or &amp;quot;intel&amp;quot; in their names are built using&lt;br /&gt;
Intel Math Kernel Library, making them (somewhat) faster in CPU tasks&lt;br /&gt;
with Intel processors (for example, on ''Saga'', '''except ''a100'' partition''' which uses AMD CPUs). &lt;br /&gt;
&lt;br /&gt;
Those with &amp;quot;foss&amp;quot; in their names are CPU-agnostic and will run on machines with all kinds of CPUs (for example, on ''Fox'', but also on all ''Saga'' partitions). These are the default ones.&lt;br /&gt;
&lt;br /&gt;
The next element in the module name after &amp;quot;foss&amp;quot;, &amp;quot;gomkl&amp;quot; or &amp;quot;intel&amp;quot; is the virtual laboratory (stack) version: for example, &amp;quot;2021a&amp;quot;, &amp;quot;2022b&amp;quot; or &amp;quot;2024a&amp;quot;. &lt;br /&gt;
Modules from different stack versions are incompatible  with each other: you cannot load a module from &amp;quot;foss-2021a&amp;quot; and a module from &amp;quot;foss-2024a&amp;quot; simultaneously.&lt;br /&gt;
'''Currently, &amp;quot;2024a&amp;quot; version is the recommended one, with the most up-to-date software built with the most up-to-date compilers.'''&lt;br /&gt;
&lt;br /&gt;
Further on, we just use the placeholder '''ARCH''', replace it with &amp;quot;gomkl&amp;quot;, &amp;quot;intel&amp;quot; or &amp;quot;foss&amp;quot; and &amp;quot;2022b&amp;quot; or &amp;quot;2024a&amp;quot;, depending on which machine you are working on and what stack version you want to use.&lt;br /&gt;
Some modules also have the Python version specified in their names (for example, &amp;quot;''nlpl-transformers/4.55.4-foss-2024a-Python-3.12.3''&amp;quot;).&lt;br /&gt;
For stack version &amp;quot;2022b&amp;quot; it is usually Python 3.10.8, for stack version &amp;quot;2024a&amp;quot; it is usually Python 3.12.3.&lt;br /&gt;
Check the output of the '''module avail nlpl''' command for the exact module names.&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;Bundle&amp;quot; modules ===&lt;br /&gt;
These are the modules with the most cryptic names. Each of them contains a bunch of software pieces (Python packages, as a rule).&lt;br /&gt;
Many of these modules will be loaded automatically as dependencies of the regular modules, but sometimes they can be useful themselves.&lt;br /&gt;
They have their own bundle versions: &amp;quot;2022.01&amp;quot; or simply &amp;quot;01&amp;quot;, etc (further specified as '''VERS''').&lt;br /&gt;
Here are the details:&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-python-candy/VERS-ARCH''': various utility packages not directly related to NLP&lt;br /&gt;
** tqdm&lt;br /&gt;
** pydot&lt;br /&gt;
** smart_open&lt;br /&gt;
** cached-property&lt;br /&gt;
** filelock&lt;br /&gt;
** termcolor&lt;br /&gt;
** regex&lt;br /&gt;
** sacremoses&lt;br /&gt;
** mpi4py&lt;br /&gt;
** jsonlines&lt;br /&gt;
** jsonschema&lt;br /&gt;
** typing_extensions&lt;br /&gt;
** packaging&lt;br /&gt;
** termcolor&lt;br /&gt;
** pyhocon&lt;br /&gt;
** blis&lt;br /&gt;
** pathspec&lt;br /&gt;
** hatchling&lt;br /&gt;
** multidict&lt;br /&gt;
** yarl&lt;br /&gt;
** black&lt;br /&gt;
** click&lt;br /&gt;
** plotly&lt;br /&gt;
** toolz&lt;br /&gt;
** msgspec&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-nlptools/VERS-ARCH''': various utility packages related to NLP&lt;br /&gt;
** evaluate&lt;br /&gt;
** conllu&lt;br /&gt;
** seqeval&lt;br /&gt;
** langdetect&lt;br /&gt;
** levenshtein&lt;br /&gt;
** rouge_score&lt;br /&gt;
** sacrebleu&lt;br /&gt;
** udapi&lt;br /&gt;
** word2number&lt;br /&gt;
** ufal.chu-liu-edmonds&lt;br /&gt;
** gensim&lt;br /&gt;
* '''nlpl-scipy-ecosystem/VERS-ARCH''': everything that constitutes the [https://scipy.org/ SciPy ecosystem]. Too many packages to enumerate them all, but the most important are:&lt;br /&gt;
** scipy&lt;br /&gt;
** pandas&lt;br /&gt;
** matplotlib&lt;br /&gt;
** ipython&lt;br /&gt;
** jupyter_core&lt;br /&gt;
** jupyter_client&lt;br /&gt;
** networkx&lt;br /&gt;
** sympy&lt;br /&gt;
** beautifulsoup4&lt;br /&gt;
** numexpr&lt;br /&gt;
** einops&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-llmtools/VERS-ARCH''': various utility packages for working with large language models (LLMs)&lt;br /&gt;
** peft ([https://pypi.org/project/peft/ HuggingFace PEFT]: State-of-the-art Parameter-Efficient Fine-Tuning)&lt;br /&gt;
** promptsource ([https://pypi.org/project/promptsource/ Toolkit for creating, sharing and using natural language prompts])&lt;br /&gt;
** lm-evaluation-harness ([https://github.com/EleutherAI/lm-evaluation-harness EleutherAI Language Model Evaluation Harness])&lt;br /&gt;
** bert_score: [https://pypi.org/project/bert-score/ BERTScore] to evaluate NLG tasks&lt;br /&gt;
** llguidance ([https://github.com/microsoft/llguidance Low-level Guidance]: constrained decoding for LLMs)&lt;br /&gt;
** mistral_common ([https://pypi.org/project/mistral_common/ Mistral-common]: common utilities for Mistral AI)&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-torch-audio-vision/VERS-ARCH''': multimodal extensions for PyTorch&lt;br /&gt;
** torch-vision ([https://github.com/pytorch/vision torchvision]: image and video datasets and models for PyTorch deep learning)&lt;br /&gt;
** torch-audio ([https://github.com/pytorch/audio torchaudio]: an audio library for PyTorch)&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;Regular&amp;quot; modules ===&lt;br /&gt;
These are more obvious modules, each one gives you one software piece:&lt;br /&gt;
&lt;br /&gt;
==== Most important ====&lt;br /&gt;
* '''nlpl-pytorch/1.6.0-ARCH-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.6.0 (for CUDA 10)&lt;br /&gt;
* '''nlpl-pytorch/1.7.1-ARCH-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 10)&lt;br /&gt;
* '''nlpl-pytorch/1.7.1-ARCH-cuda-11.1.1-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 11)&lt;br /&gt;
* '''nlpl-pytorch/1.11.0-ARCH-cuda-11.3.1-Python-3.9.5''': [https://pytorch.org/ PyTorch] 1.11.0 (for CUDA 11)&lt;br /&gt;
* '''nlpl-pytorch/2.1.2-ARCH-cuda-12.0.0-Python-3.10.8''': [https://pytorch.org/ PyTorch] 2.1.2 (for CUDA 12)&lt;br /&gt;
* '''nlpl-pytorch/2.6.0-ARCH-cuda-12.6.0-Python-3.12.3''': [https://pytorch.org/ PyTorch] 2.6.0 (for CUDA 12.6)&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-tensorflow/1.15.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 1.15.2 (for CUDA 10)&lt;br /&gt;
* '''nlpl-tensorflow/2.3.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.3.2 (for CUDA 10)&lt;br /&gt;
* '''nlpl-tensorflow/2.6.2-ARCH-cuda-11.1.1-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.6.2 (for CUDA 11)&lt;br /&gt;
* '''nlpl-tensorflow/2.6.5-ARCH-cuda-11.3.1-Python-3.9.5''': [https://www.tensorflow.org/ TensorFlow] 2.6.5 (for CUDA 11)&lt;br /&gt;
* '''nlpl-tensorflow/2.15.0-ARCH-cuda-12.0.0-Python-3.10.8''': [https://www.tensorflow.org/ TensorFlow] 2.15.0 (for CUDA 12)&lt;br /&gt;
* '''nlpl-tensorflow/2.18.1-ARCH-cuda-12.6.0-Python-3.12.3''': [https://www.tensorflow.org/ TensorFlow] 2.18.1 (for CUDA 12.6)&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-accelerate/0.13.2-ARCH-Python-3.9.5''': [https://pypi.org/project/accelerate/ Accelerate] 0.13.2&lt;br /&gt;
* '''nlpl-accelerate/0.27.2-ARCH-Python-3.10.8''': [https://pypi.org/project/accelerate/ Accelerate] 0.27.2&lt;br /&gt;
* '''nlpl-accelerate/1.9.0-ARCH-Python-3.12.3''': [https://pypi.org/project/accelerate/ Accelerate] 1.9.0&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-transformers/VERS-ARCH''': [https://huggingface.co/transformers/ HuggingFace Transformers]&lt;br /&gt;
* '''nlpl-vllm/VERS-ARCH''': [https://github.com/vllm-project/vllm vLLM] (also includes ''flash-attention'', ''xformers'', ''openai'', ''Ray'')&lt;br /&gt;
&lt;br /&gt;
==== Others ====&lt;br /&gt;
* '''nlpl-bitsandbytes/VERS-ARCH''': [https://pypi.org/project/bitsandbytes/ BitsAndBytes]&lt;br /&gt;
* '''nlpl-bm25s/VERS-ARCH''': [https://github.com/xhluca/bm25s BM25S]&lt;br /&gt;
* '''nlpl-cython/VERS-ARCH''': [http://cython.org/ Cython]&lt;br /&gt;
* '''nlpl-datasets/VERS-ARCH''': [https://github.com/huggingface/datasets HuggingFace Datasets]&lt;br /&gt;
* '''nlpl-gensim/VERS-ARCH''': [https://github.com/RaRe-Technologies/gensim Gensim]&lt;br /&gt;
* '''nlpl-horovod/VERS-ARCH''': [https://github.com/horovod/horovod Horovod]&lt;br /&gt;
* '''nlpl-huggingface-hub/VERS-ARCH''': [https://pypi.org/project/huggingface-hub/ HuggingFace Hub]&lt;br /&gt;
* '''nlpl-nltk/VERS-ARCH''': [https://www.nltk.org/ NLTK], together with '''all''' the corpora and datasets (no need to download them separately!)&lt;br /&gt;
* '''nlpl-numpy/VERS-ARCH''': [https://numpy.org/ NumPy]&lt;br /&gt;
* '''nlpl-pytorch-lightning/VERS-ARCH''': [https://www.pytorchlightning.ai/ PyTorch Lightning] &lt;br /&gt;
* '''nlpl-scikit-bundle/VERS-ARCH''': [https://scikit-learn.org/ Scikit-Learn]&lt;br /&gt;
* '''nlpl-sentencepiece/VERS-ARCH''': [https://github.com/google/sentencepiece SentencePiece]&lt;br /&gt;
* '''nlpl-sentence-transformers/VERS-ARCH''': [https://sbert.net SentenceTransformers]&lt;br /&gt;
* '''nlpl-simple_elmo/VERS-ARCH''': [https://pypi.org/project/simple-elmo/ Simple_elmo]&lt;br /&gt;
* '''nlpl-stanza/VERS-ARCH''': [https://stanfordnlp.github.io/stanza/ Stanza]&lt;br /&gt;
* '''nlpl-tensorboard/VERS-ARCH''': [https://github.com/tensorflow/tensorboard TensorBoard]&lt;br /&gt;
* '''nlpl-tokenizers/VERS-ARCH''': [https://github.com/huggingface/tokenizers HuggingFace Tokenizers]&lt;br /&gt;
* '''nlpl-torch-geometric/VERS-ARCH''': [https://pyg.org/ PyTorch Geometric]&lt;br /&gt;
* '''nlpl-torchmetrics/VERS-ARCH''': [https://pypi.org/project/torchmetrics/ TorchMetrics]&lt;br /&gt;
* '''nlpl-trl/VERS-ARCH''': [https://huggingface.co/docs/trl/index HuggingFace Transformer Reinforcement Learning]&lt;br /&gt;
* '''nlpl-wandb/VERS-ARCH''': [https://pypi.org/project/wandb/ Weights and Biases (wandb)]&lt;br /&gt;
* '''nlpl-warc2text/VERS-ARCH''': [https://github.com/bitextor/warc2text warc2text]&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-torchtext/VERS-ARCH''': [https://pypi.org/project/torchtext/ TorchText] ''(not maintained any more, don't expect much)''&lt;br /&gt;
* '''nlpl-dllogger/0.1.0-ARCH-2019b-Python-3.7.4''': [https://github.com/NVIDIA/dllogger DLLogger] 0.1.0 ''(status unclear)''&lt;br /&gt;
* '''nlpl-nvidia-bert/20.06.8-ARCH-2019b-tensorflow-1.15.2-Python-3.7.4''': [https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT NVIDIA's BERT implementation] for TensorFlow 1 ''(status unclear)''&lt;br /&gt;
&lt;br /&gt;
= Source =&lt;br /&gt;
&lt;br /&gt;
Currently, the virtual laboratory is generated using EasyBuild, all the code and easyconfigs available [https://source.coderefinery.org/nlpl/easybuild in this repository].&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Eosc/easybuild/modules&amp;diff=1803</id>
		<title>Eosc/easybuild/modules</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Eosc/easybuild/modules&amp;diff=1803"/>
		<updated>2025-10-21T23:24:54Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* List of modules */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= NLPL virtual laboratory =&lt;br /&gt;
&lt;br /&gt;
The laboratory is a reproducible custom-built set of NLP software. &lt;br /&gt;
It is currently installed on ''Saga'' and ''Fox'' HPC clusters.&lt;br /&gt;
&lt;br /&gt;
- To use on ''Saga'': run the following command (can be put in the ''~/.bashrc'' file to be run automatically at login):&lt;br /&gt;
&lt;br /&gt;
''module use -a /cluster/shared/nlpl/software/eb/etc/all/''&lt;br /&gt;
&lt;br /&gt;
- To use on ''Fox'': run the following command (can be put in the ''~/.bashrc'' file to be run automatically at login):&lt;br /&gt;
&lt;br /&gt;
''module use -a /fp/projects01/ec30/software/easybuild/modules/all/''&lt;br /&gt;
&lt;br /&gt;
After that, the &amp;quot;nlpl&amp;quot;-branded modules will be available via ''module avail'', ''module load'', etc. See all the NLPL modules with the ''module avail nlpl'' command.&lt;br /&gt;
&lt;br /&gt;
It is highly recommended to use them, instead of installing a copy in one's own home directory.&lt;br /&gt;
&lt;br /&gt;
== List of modules ==&lt;br /&gt;
From time to time, updated modules with newer software versions will be added, &lt;br /&gt;
but the older modules will never be removed (for reproducibility).&lt;br /&gt;
&lt;br /&gt;
Note that the modules which have &amp;quot;gomkl&amp;quot; or &amp;quot;intel&amp;quot; in their names are built using&lt;br /&gt;
Intel Math Kernel Library, making them (somewhat) faster in CPU tasks&lt;br /&gt;
with Intel processors (for example, on ''Saga'', '''except ''a100'' partition''' which uses AMD CPUs). &lt;br /&gt;
&lt;br /&gt;
Those with &amp;quot;foss&amp;quot; in their names are CPU-agnostic and will run on machines with all kinds of CPUs (for example, on ''Fox'', but also on all ''Saga'' partitions). These are the default ones.&lt;br /&gt;
&lt;br /&gt;
The next element in the module name after &amp;quot;foss&amp;quot;, &amp;quot;gomkl&amp;quot; or &amp;quot;intel&amp;quot; is the virtual laboratory (stack) version: for example, &amp;quot;2021a&amp;quot;, &amp;quot;2022b&amp;quot; or &amp;quot;2024a&amp;quot;. &lt;br /&gt;
Modules from different stack versions are incompatible  with each other: you cannot load a module from &amp;quot;foss-2021a&amp;quot; and a module from &amp;quot;foss-2024a&amp;quot; simultaneously.&lt;br /&gt;
'''Currently, &amp;quot;2024a&amp;quot; version is the recommended one, with the most up-to-date software built with the most up-to-date compilers.'''&lt;br /&gt;
&lt;br /&gt;
Further on, we just use the placeholder '''ARCH''', replace it with &amp;quot;gomkl&amp;quot;, &amp;quot;intel&amp;quot; or &amp;quot;foss&amp;quot; and &amp;quot;2022b&amp;quot; or &amp;quot;2024a&amp;quot;, depending on which machine you are working on and what stack version you want to use.&lt;br /&gt;
Some modules also have the Python version specified in their names (for example, &amp;quot;''nlpl-transformers/4.55.4-foss-2024a-Python-3.12.3''&amp;quot;).&lt;br /&gt;
For stack version &amp;quot;2022b&amp;quot; it is usually Python 3.10.8, for stack version &amp;quot;2024a&amp;quot; it is usually Python 3.12.3.&lt;br /&gt;
Check the output of the '''module avail nlpl''' command for the exact module names.&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;Bundle&amp;quot; modules ===&lt;br /&gt;
These are the modules with the most cryptic names. Each of them contains a bunch of software pieces (Python packages, as a rule).&lt;br /&gt;
Many of these modules will be loaded automatically as dependencies of the regular modules, but sometimes they can be useful themselves.&lt;br /&gt;
They have their own bundle versions: &amp;quot;2022.01&amp;quot; or simply &amp;quot;01&amp;quot;, etc (further specified as '''VERS''').&lt;br /&gt;
Here are the details:&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-python-candy/VERS-ARCH''': various utility packages not directly related to NLP&lt;br /&gt;
** tqdm&lt;br /&gt;
** pydot&lt;br /&gt;
** smart_open&lt;br /&gt;
** cached-property&lt;br /&gt;
** filelock&lt;br /&gt;
** termcolor&lt;br /&gt;
** regex&lt;br /&gt;
** sacremoses&lt;br /&gt;
** mpi4py&lt;br /&gt;
** jsonlines&lt;br /&gt;
** jsonschema&lt;br /&gt;
** typing_extensions&lt;br /&gt;
** packaging&lt;br /&gt;
** termcolor&lt;br /&gt;
** pyhocon&lt;br /&gt;
** blis&lt;br /&gt;
** pathspec&lt;br /&gt;
** hatchling&lt;br /&gt;
** multidict&lt;br /&gt;
** yarl&lt;br /&gt;
** black&lt;br /&gt;
** click&lt;br /&gt;
** plotly&lt;br /&gt;
** toolz&lt;br /&gt;
** msgspec&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-nlptools/VERS-ARCH''': various utility packages related to NLP&lt;br /&gt;
** evaluate&lt;br /&gt;
** conllu&lt;br /&gt;
** seqeval&lt;br /&gt;
** langdetect&lt;br /&gt;
** levenshtein&lt;br /&gt;
** rouge_score&lt;br /&gt;
** sacrebleu&lt;br /&gt;
** udapi&lt;br /&gt;
** word2number&lt;br /&gt;
** ufal.chu-liu-edmonds&lt;br /&gt;
** gensim&lt;br /&gt;
* '''nlpl-scipy-ecosystem/VERS-ARCH''': everything that constitutes the [https://scipy.org/ SciPy ecosystem]. Too many packages to enumerate them all, but the most important are:&lt;br /&gt;
** scipy&lt;br /&gt;
** pandas&lt;br /&gt;
** matplotlib&lt;br /&gt;
** ipython&lt;br /&gt;
** jupyter_core&lt;br /&gt;
** jupyter_client&lt;br /&gt;
** networkx&lt;br /&gt;
** sympy&lt;br /&gt;
** beautifulsoup4&lt;br /&gt;
** numexpr&lt;br /&gt;
** einops&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-llmtools/VERS-ARCH''': various utility packages for working with large language models (LLMs)&lt;br /&gt;
** peft ([https://pypi.org/project/peft/ HuggingFace PEFT]: State-of-the-art Parameter-Efficient Fine-Tuning)&lt;br /&gt;
** promptsource ([https://pypi.org/project/promptsource/ Toolkit for creating, sharing and using natural language prompts])&lt;br /&gt;
** lm-evaluation-harness ([https://github.com/EleutherAI/lm-evaluation-harness EleutherAI Language Model Evaluation Harness])&lt;br /&gt;
** bert_score: [https://pypi.org/project/bert-score/ BERTScore] to evaluate NLG tasks&lt;br /&gt;
** llguidance ([https://github.com/microsoft/llguidance Low-level Guidance]: constrained decoding for LLMs)&lt;br /&gt;
** mistral_common ([https://pypi.org/project/mistral_common/ Mistral-common]: common utilities for Mistral AI)&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-torch-audio-vision/VERS-ARCH''': multimodal extensions for PyTorch&lt;br /&gt;
** torch-vision ([https://github.com/pytorch/vision torchvision]: image and video datasets and models for PyTorch deep learning)&lt;br /&gt;
** torch-audio ([https://github.com/pytorch/audio torchaudio]: an audio library for PyTorch)&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;Regular&amp;quot; modules ===&lt;br /&gt;
These are more obvious modules, each one gives you one software piece:&lt;br /&gt;
&lt;br /&gt;
==== Most important ====&lt;br /&gt;
* '''nlpl-pytorch/1.6.0-ARCH-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.6.0 (for CUDA 10)&lt;br /&gt;
* '''nlpl-pytorch/1.7.1-ARCH-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 10)&lt;br /&gt;
* '''nlpl-pytorch/1.7.1-ARCH-cuda-11.1.1-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 11)&lt;br /&gt;
* '''nlpl-pytorch/1.11.0-ARCH-cuda-11.3.1-Python-3.9.5''': [https://pytorch.org/ PyTorch] 1.11.0 (for CUDA 11)&lt;br /&gt;
* '''nlpl-pytorch/2.1.2-ARCH-cuda-12.0.0-Python-3.10.8''': [https://pytorch.org/ PyTorch] 2.1.2 (for CUDA 12)&lt;br /&gt;
* '''nlpl-pytorch/2.6.0-ARCH-cuda-12.6.0-Python-3.12.3''': [https://pytorch.org/ PyTorch] 2.6.0 (for CUDA 12.6)&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-tensorflow/1.15.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 1.15.2 (for CUDA 10)&lt;br /&gt;
* '''nlpl-tensorflow/2.3.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.3.2 (for CUDA 10)&lt;br /&gt;
* '''nlpl-tensorflow/2.6.2-ARCH-cuda-11.1.1-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.6.2 (for CUDA 11)&lt;br /&gt;
* '''nlpl-tensorflow/2.6.5-ARCH-cuda-11.3.1-Python-3.9.5''': [https://www.tensorflow.org/ TensorFlow] 2.6.5 (for CUDA 11)&lt;br /&gt;
* '''nlpl-tensorflow/2.15.0-ARCH-cuda-12.0.0-Python-3.10.8''': [https://www.tensorflow.org/ TensorFlow] 2.15.0 (for CUDA 12)&lt;br /&gt;
* '''nlpl-tensorflow/2.18.1-ARCH-cuda-12.6.0-Python-3.12.3''': [https://www.tensorflow.org/ TensorFlow] 2.18.1 (for CUDA 12.6)&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-accelerate/0.13.2-ARCH-Python-3.9.5''': [https://pypi.org/project/accelerate/ Accelerate] 0.13.2&lt;br /&gt;
* '''nlpl-accelerate/0.27.2-ARCH-Python-3.10.8''': [https://pypi.org/project/accelerate/ Accelerate] 0.27.2&lt;br /&gt;
* '''nlpl-accelerate/1.9.0-ARCH-Python-3.12.3''': [https://pypi.org/project/accelerate/ Accelerate] 1.9.0&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-transformers/VERS-ARCH''': [https://huggingface.co/transformers/ HuggingFace Transformers]&lt;br /&gt;
* '''nlpl-vllm/VERS-ARCH''': [https://github.com/vllm-project/vllm vLLM] (also includes ''flash-attention'', ''xformers'', ''openai'', ''Ray'')&lt;br /&gt;
&lt;br /&gt;
==== Others ====&lt;br /&gt;
* '''nlpl-bitsandbytes/VERS-ARCH''': [https://pypi.org/project/bitsandbytes/ BitsAndBytes]&lt;br /&gt;
* '''nlpl-bm25s/VERS-ARCH''': [https://github.com/xhluca/bm25s BM25S]&lt;br /&gt;
* '''nlpl-cython/VERS-ARCH''': [http://cython.org/ Cython]&lt;br /&gt;
* '''nlpl-datasets/VERS-ARCH''': [https://github.com/huggingface/datasets HuggingFace Datasets]&lt;br /&gt;
* '''nlpl-gensim/VERS-ARCH''': [https://github.com/RaRe-Technologies/gensim Gensim]&lt;br /&gt;
* '''nlpl-horovod/VERS-ARCH''': [https://github.com/horovod/horovod Horovod]&lt;br /&gt;
* '''nlpl-huggingface-hub/VERS-ARCH''': [https://pypi.org/project/huggingface-hub/ HuggingFace Hub]&lt;br /&gt;
* '''nlpl-nltk/VERS-ARCH''': [https://www.nltk.org/ NLTK], together with '''all''' the corpora and datasets (no need to download them separately!)&lt;br /&gt;
* '''nlpl-numpy/VERS-ARCH''': [https://numpy.org/ NumPy]&lt;br /&gt;
* '''nlpl-pytorch-lightning/VERS-ARCH''': [https://www.pytorchlightning.ai/ PyTorch Lightning] &lt;br /&gt;
* '''nlpl-scikit-bundle/VERS-ARCH''': [https://scikit-learn.org/ Scikit-Learn]&lt;br /&gt;
* '''nlpl-sentencepiece/VERS-ARCH''': [https://github.com/google/sentencepiece SentencePiece]&lt;br /&gt;
* '''nlpl-sentence-transformers/VERS-ARCH''': [https://sbert.net SentenceTransformers]&lt;br /&gt;
* '''nlpl-simple_elmo/VERS-ARCH''': [https://pypi.org/project/simple-elmo/ Simple_elmo]&lt;br /&gt;
* '''nlpl-stanza/VERS-ARCH''': [https://stanfordnlp.github.io/stanza/ Stanza]&lt;br /&gt;
* '''nlpl-tensorboard/VERS-ARCH''': [https://github.com/tensorflow/tensorboard TensorBoard]&lt;br /&gt;
* '''nlpl-tokenizers/VERS-ARCH''': [https://github.com/huggingface/tokenizers HuggingFace Tokenizers]&lt;br /&gt;
* '''nlpl-torch-geometric/VERS-ARCH''': [https://pyg.org/ PyTorch Geometric]&lt;br /&gt;
* '''nlpl-torchmetrics/VERS-ARCH''': [https://pypi.org/project/torchmetrics/ TorchMetrics]&lt;br /&gt;
* '''nlpl-torchtext/VERS-ARCH''': [https://pypi.org/project/torchtext/ TorchText]&lt;br /&gt;
* '''nlpl-trl/VERS-ARCH''': [https://huggingface.co/docs/trl/index HuggingFace Transformer Reinforcement Learning]&lt;br /&gt;
* '''nlpl-wandb/VERS-ARCH''': [https://pypi.org/project/wandb/ Weights and Biases (wandb)]&lt;br /&gt;
* '''nlpl-warc2text/VERS-ARCH''': [https://github.com/bitextor/warc2text warc2text]&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-dllogger/0.1.0-ARCH-2019b-Python-3.7.4''': [https://github.com/NVIDIA/dllogger DLLogger] 0.1.0 ''(status unclear)''&lt;br /&gt;
* '''nlpl-nvidia-bert/20.06.8-ARCH-2019b-tensorflow-1.15.2-Python-3.7.4''': [https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT NVIDIA's BERT implementation] for TensorFlow 1 ''(status unclear)''&lt;br /&gt;
&lt;br /&gt;
= Source =&lt;br /&gt;
&lt;br /&gt;
Currently, the virtual laboratory is generated using EasyBuild, all the code and easyconfigs available [https://source.coderefinery.org/nlpl/easybuild in this repository].&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Eosc/easybuild/modules&amp;diff=1802</id>
		<title>Eosc/easybuild/modules</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Eosc/easybuild/modules&amp;diff=1802"/>
		<updated>2025-10-21T23:23:19Z</updated>

		<summary type="html">&lt;p&gt;Andreku: 2024a&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= NLPL virtual laboratory =&lt;br /&gt;
&lt;br /&gt;
The laboratory is a reproducible custom-built set of NLP software. &lt;br /&gt;
It is currently installed on ''Saga'' and ''Fox'' HPC clusters.&lt;br /&gt;
&lt;br /&gt;
- To use on ''Saga'': run the following command (can be put in the ''~/.bashrc'' file to be run automatically at login):&lt;br /&gt;
&lt;br /&gt;
''module use -a /cluster/shared/nlpl/software/eb/etc/all/''&lt;br /&gt;
&lt;br /&gt;
- To use on ''Fox'': run the following command (can be put in the ''~/.bashrc'' file to be run automatically at login):&lt;br /&gt;
&lt;br /&gt;
''module use -a /fp/projects01/ec30/software/easybuild/modules/all/''&lt;br /&gt;
&lt;br /&gt;
After that, the &amp;quot;nlpl&amp;quot;-branded modules will be available via ''module avail'', ''module load'', etc. See all the NLPL modules with the ''module avail nlpl'' command.&lt;br /&gt;
&lt;br /&gt;
It is highly recommended to use them, instead of installing a copy in one's own home directory.&lt;br /&gt;
&lt;br /&gt;
== List of modules ==&lt;br /&gt;
From time to time, updated modules with newer software versions will be added, &lt;br /&gt;
but the older modules will never be removed (for reproducibility).&lt;br /&gt;
&lt;br /&gt;
Note that the modules which have &amp;quot;gomkl&amp;quot; or &amp;quot;intel&amp;quot; in their names are built using&lt;br /&gt;
Intel Math Kernel Library, making them (somewhat) faster in CPU tasks&lt;br /&gt;
with Intel processors (for example, on ''Saga'', '''except ''a100'' partition''' which uses AMD CPUs). &lt;br /&gt;
&lt;br /&gt;
Those with &amp;quot;foss&amp;quot; in their names are CPU-agnostic and will run on machines with all kinds of CPUs (for example, on ''Fox'', but also on all ''Saga'' partitions). These are the default ones.&lt;br /&gt;
&lt;br /&gt;
The next element in the module name after &amp;quot;foss&amp;quot;, &amp;quot;gomkl&amp;quot; or &amp;quot;intel&amp;quot; is the virtual laboratory (stack) version: for example, &amp;quot;2021a&amp;quot;, &amp;quot;2022b&amp;quot; or &amp;quot;2024a&amp;quot;. &lt;br /&gt;
Modules from different stack versions are incompatible  with each other: you cannot load a module from &amp;quot;foss-2021a&amp;quot; and a module from &amp;quot;foss-2024a&amp;quot; simultaneously.&lt;br /&gt;
'''Currently, &amp;quot;2024a&amp;quot; version is the recommended one, with the most up-to-date software built with the most up-to-date compilers.'''&lt;br /&gt;
&lt;br /&gt;
Further on, we just use the placeholder '''ARCH''', replace it with &amp;quot;gomkl&amp;quot;, &amp;quot;intel&amp;quot; or &amp;quot;foss&amp;quot; and &amp;quot;2022b&amp;quot; or &amp;quot;2024a&amp;quot;, depending on which machine you are working on and what stack version you want to use.&lt;br /&gt;
Some modules also have the Python version specified in their names (for example, &amp;quot;''nlpl-transformers/4.55.4-foss-2024a-Python-3.12.3''&amp;quot;).&lt;br /&gt;
For stack version &amp;quot;2022b&amp;quot; it is usually Python 3.10.8, for stack version &amp;quot;2024a&amp;quot; it is usually Python 3.12.3.&lt;br /&gt;
Check the output of the '''module avail nlpl''' command for the exact module names.&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;Bundle&amp;quot; modules ===&lt;br /&gt;
These are the modules with the most cryptic names. Each of them contains a bunch of software pieces (Python packages, as a rule).&lt;br /&gt;
Many of these modules will be loaded automatically as dependencies of the regular modules, but sometimes they can be useful themselves.&lt;br /&gt;
They have their own bundle versions: &amp;quot;2022.01&amp;quot; or simply &amp;quot;01&amp;quot;, etc (further specified as '''VERS''').&lt;br /&gt;
Here are the details:&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-python-candy/VERS-ARCH''': various utility packages not directly related to NLP&lt;br /&gt;
** tqdm&lt;br /&gt;
** pydot&lt;br /&gt;
** smart_open&lt;br /&gt;
** cached-property&lt;br /&gt;
** filelock&lt;br /&gt;
** termcolor&lt;br /&gt;
** regex&lt;br /&gt;
** sacremoses&lt;br /&gt;
** mpi4py&lt;br /&gt;
** jsonlines&lt;br /&gt;
** jsonschema&lt;br /&gt;
** typing_extensions&lt;br /&gt;
** packaging&lt;br /&gt;
** termcolor&lt;br /&gt;
** pyhocon&lt;br /&gt;
** blis&lt;br /&gt;
** pathspec&lt;br /&gt;
** hatchling&lt;br /&gt;
** multidict&lt;br /&gt;
** yarl&lt;br /&gt;
** black&lt;br /&gt;
** click&lt;br /&gt;
** plotly&lt;br /&gt;
** toolz&lt;br /&gt;
** msgspec&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-nlptools/VERS-ARCH''': various utility packages related to NLP&lt;br /&gt;
** evaluate&lt;br /&gt;
** conllu&lt;br /&gt;
** seqeval&lt;br /&gt;
** langdetect&lt;br /&gt;
** levenshtein&lt;br /&gt;
** rouge_score&lt;br /&gt;
** sacrebleu&lt;br /&gt;
** udapi&lt;br /&gt;
** word2number&lt;br /&gt;
** ufal.chu-liu-edmonds&lt;br /&gt;
** gensim&lt;br /&gt;
* '''nlpl-scipy-ecosystem/VERS-ARCH''': everything that constitutes the [https://scipy.org/ SciPy ecosystem]. Too many packages to enumerate them all, but the most important are:&lt;br /&gt;
** scipy&lt;br /&gt;
** pandas&lt;br /&gt;
** matplotlib&lt;br /&gt;
** ipython&lt;br /&gt;
** jupyter_core&lt;br /&gt;
** jupyter_client&lt;br /&gt;
** networkx&lt;br /&gt;
** sympy&lt;br /&gt;
** beautifulsoup4&lt;br /&gt;
** numexpr&lt;br /&gt;
** einops&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-llmtools/VERS-ARCH''': various utility packages for working with large language models (LLMs)&lt;br /&gt;
** peft ([https://pypi.org/project/peft/ HuggingFace PEFT]: State-of-the-art Parameter-Efficient Fine-Tuning)&lt;br /&gt;
** promptsource ([https://pypi.org/project/promptsource/ Toolkit for creating, sharing and using natural language prompts])&lt;br /&gt;
** lm-evaluation-harness ([https://github.com/EleutherAI/lm-evaluation-harness EleutherAI Language Model Evaluation Harness])&lt;br /&gt;
** bert_score: [https://pypi.org/project/bert-score/ BERTScore] to evaluate NLG tasks&lt;br /&gt;
** llguidance ([https://github.com/microsoft/llguidance Low-level Guidance]: constrained decoding for LLMs)&lt;br /&gt;
** mistral_common ([https://pypi.org/project/mistral_common/ Mistral-common]: common utilities for Mistral AI)&lt;br /&gt;
** ...&lt;br /&gt;
* nlpl-torch-audio-vision/VERS-ARCH: multimodal extensions for PyTorch&lt;br /&gt;
** torch-vision ([https://github.com/pytorch/vision torchvision]: image and video datasets and models for PyTorch deep learning)&lt;br /&gt;
** torch-audio ([https://github.com/pytorch/audio torchaudio]: an audio library for PyTorch)&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;Regular&amp;quot; modules ===&lt;br /&gt;
These are more obvious modules, each one gives you one software piece:&lt;br /&gt;
&lt;br /&gt;
==== Most important ====&lt;br /&gt;
* '''nlpl-pytorch/1.6.0-ARCH-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.6.0 (for CUDA 10)&lt;br /&gt;
* '''nlpl-pytorch/1.7.1-ARCH-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 10)&lt;br /&gt;
* '''nlpl-pytorch/1.7.1-ARCH-cuda-11.1.1-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 11)&lt;br /&gt;
* '''nlpl-pytorch/1.11.0-ARCH-cuda-11.3.1-Python-3.9.5''': [https://pytorch.org/ PyTorch] 1.11.0 (for CUDA 11)&lt;br /&gt;
* '''nlpl-pytorch/2.1.2-ARCH-cuda-12.0.0-Python-3.10.8''': [https://pytorch.org/ PyTorch] 2.1.2 (for CUDA 12)&lt;br /&gt;
* '''nlpl-pytorch/2.6.0-ARCH-cuda-12.6.0-Python-3.12.3''': [https://pytorch.org/ PyTorch] 2.6.0 (for CUDA 12.6)&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-tensorflow/1.15.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 1.15.2 (for CUDA 10)&lt;br /&gt;
* '''nlpl-tensorflow/2.3.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.3.2 (for CUDA 10)&lt;br /&gt;
* '''nlpl-tensorflow/2.6.2-ARCH-cuda-11.1.1-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.6.2 (for CUDA 11)&lt;br /&gt;
* '''nlpl-tensorflow/2.6.5-ARCH-cuda-11.3.1-Python-3.9.5''': [https://www.tensorflow.org/ TensorFlow] 2.6.5 (for CUDA 11)&lt;br /&gt;
* '''nlpl-tensorflow/2.15.0-ARCH-cuda-12.0.0-Python-3.10.8''': [https://www.tensorflow.org/ TensorFlow] 2.15.0 (for CUDA 12)&lt;br /&gt;
* '''nlpl-tensorflow/2.18.1-ARCH-cuda-12.6.0-Python-3.12.3''': [https://www.tensorflow.org/ TensorFlow] 2.18.1 (for CUDA 12.6)&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-accelerate/0.13.2-ARCH-Python-3.9.5''': [https://pypi.org/project/accelerate/ Accelerate] 0.13.2&lt;br /&gt;
* '''nlpl-accelerate/0.27.2-ARCH-Python-3.10.8''': [https://pypi.org/project/accelerate/ Accelerate] 0.27.2&lt;br /&gt;
* '''nlpl-accelerate/1.9.0-ARCH-Python-3.12.3''': [https://pypi.org/project/accelerate/ Accelerate] 1.9.0&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-transformers/VERS-ARCH''': [https://huggingface.co/transformers/ HuggingFace Transformers]&lt;br /&gt;
* '''nlpl-vllm/VERS-ARCH''': [https://github.com/vllm-project/vllm vLLM] (also includes ''flash-attention'', ''xformers'', ''openai'', ''Ray'')&lt;br /&gt;
&lt;br /&gt;
==== Others ====&lt;br /&gt;
* '''nlpl-bitsandbytes/VERS-ARCH''': [https://pypi.org/project/bitsandbytes/ BitsAndBytes]&lt;br /&gt;
* '''nlpl-bm25s/VERS-ARCH''': [https://github.com/xhluca/bm25s BM25S]&lt;br /&gt;
* '''nlpl-cython/VERS-ARCH''': [http://cython.org/ Cython]&lt;br /&gt;
* '''nlpl-datasets/VERS-ARCH''': [https://github.com/huggingface/datasets HuggingFace Datasets]&lt;br /&gt;
* '''nlpl-gensim/VERS-ARCH''': [https://github.com/RaRe-Technologies/gensim Gensim]&lt;br /&gt;
* '''nlpl-horovod/VERS-ARCH''': [https://github.com/horovod/horovod Horovod]&lt;br /&gt;
* '''nlpl-huggingface-hub/VERS-ARCH''': [https://pypi.org/project/huggingface-hub/ HuggingFace Hub]&lt;br /&gt;
* '''nlpl-nltk/VERS-ARCH''': [https://www.nltk.org/ NLTK], together with '''all''' the corpora and datasets (no need to download them separately!)&lt;br /&gt;
* '''nlpl-numpy/VERS-ARCH''': [https://numpy.org/ NumPy]&lt;br /&gt;
* '''nlpl-pytorch-lightning/VERS-ARCH''': [https://www.pytorchlightning.ai/ PyTorch Lightning] &lt;br /&gt;
* '''nlpl-scikit-bundle/VERS-ARCH''': [https://scikit-learn.org/ Scikit-Learn]&lt;br /&gt;
* '''nlpl-sentencepiece/VERS-ARCH''': [https://github.com/google/sentencepiece SentencePiece]&lt;br /&gt;
* '''nlpl-sentence-transformers/VERS-ARCH''': [https://sbert.net SentenceTransformers]&lt;br /&gt;
* '''nlpl-simple_elmo/VERS-ARCH''': [https://pypi.org/project/simple-elmo/ Simple_elmo]&lt;br /&gt;
* '''nlpl-stanza/VERS-ARCH''': [https://stanfordnlp.github.io/stanza/ Stanza]&lt;br /&gt;
* '''nlpl-tensorboard/VERS-ARCH''': [https://github.com/tensorflow/tensorboard TensorBoard]&lt;br /&gt;
* '''nlpl-tokenizers/VERS-ARCH''': [https://github.com/huggingface/tokenizers HuggingFace Tokenizers]&lt;br /&gt;
* '''nlpl-torch-geometric/VERS-ARCH''': [https://pyg.org/ PyTorch Geometric]&lt;br /&gt;
* '''nlpl-torchmetrics/VERS-ARCH''': [https://pypi.org/project/torchmetrics/ TorchMetrics]&lt;br /&gt;
* '''nlpl-torchtext/VERS-ARCH''': [https://pypi.org/project/torchtext/ TorchText]&lt;br /&gt;
* '''nlpl-trl/VERS-ARCH''': [https://huggingface.co/docs/trl/index HuggingFace Transformer Reinforcement Learning]&lt;br /&gt;
* '''nlpl-wandb/VERS-ARCH''': [https://pypi.org/project/wandb/ Weights and Biases (wandb)]&lt;br /&gt;
* '''nlpl-warc2text/VERS-ARCH''': [https://github.com/bitextor/warc2text warc2text]&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-dllogger/0.1.0-ARCH-2019b-Python-3.7.4''': [https://github.com/NVIDIA/dllogger DLLogger] 0.1.0 ''(status unclear)''&lt;br /&gt;
* '''nlpl-nvidia-bert/20.06.8-ARCH-2019b-tensorflow-1.15.2-Python-3.7.4''': [https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT NVIDIA's BERT implementation] for TensorFlow 1 ''(status unclear)''&lt;br /&gt;
&lt;br /&gt;
= Source =&lt;br /&gt;
&lt;br /&gt;
Currently, the virtual laboratory is generated using EasyBuild, all the code and easyconfigs available [https://source.coderefinery.org/nlpl/easybuild in this repository].&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Eosc/easybuild/modules&amp;diff=1781</id>
		<title>Eosc/easybuild/modules</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Eosc/easybuild/modules&amp;diff=1781"/>
		<updated>2025-03-24T18:21:26Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Others */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= NLPL virtual laboratory =&lt;br /&gt;
&lt;br /&gt;
The laboratory is a reproducible custom-built set of NLP software. &lt;br /&gt;
It is currently installed on ''Saga'', ''Fox'', and ''Puhti'' HPC clusters.&lt;br /&gt;
&lt;br /&gt;
- To use on ''Saga'': run the following command (can be put in the ''~/.bashrc'' file to be run automatically at login):&lt;br /&gt;
&lt;br /&gt;
''module use -a /cluster/shared/nlpl/software/eb/etc/all/''&lt;br /&gt;
&lt;br /&gt;
- To use on ''Fox'': run the following command (can be put in the ''~/.bashrc'' file to be run automatically at login):&lt;br /&gt;
&lt;br /&gt;
''module use -a /fp/projects01/ec30/software/easybuild/modules/all/''&lt;br /&gt;
&lt;br /&gt;
After that, the &amp;quot;nlpl&amp;quot;-branded modules will be available via ''module avail'', ''module load'', etc.&lt;br /&gt;
&lt;br /&gt;
It is highly recommended to use them, instead of installing a copy in one's own home directory.&lt;br /&gt;
&lt;br /&gt;
== List of modules ==&lt;br /&gt;
From time to time, updated modules with newer software versions will be added, &lt;br /&gt;
but the older modules will never be removed (for reproducibility).&lt;br /&gt;
&lt;br /&gt;
Note that the modules which have &amp;quot;gomkl&amp;quot; or &amp;quot;intel&amp;quot; in their names are built using&lt;br /&gt;
Intel Math Kernel Library, making them (somewhat) faster in CPU tasks&lt;br /&gt;
with Intel processors (for example, on ''Saga'', '''except ''a100'' partition''' which uses AMD CPUs). &lt;br /&gt;
&lt;br /&gt;
Those with &amp;quot;foss&amp;quot; in their names are CPU-agnostic and will run on machines with all kinds of CPUs (for example, on ''Fox'', but also on all ''Saga'' partitions).&lt;br /&gt;
&lt;br /&gt;
The next element in the module name after &amp;quot;foss&amp;quot;, &amp;quot;gomkl&amp;quot; or &amp;quot;intel&amp;quot; is the virtual laboratory (stack) version: for example, &amp;quot;2019b&amp;quot;, &amp;quot;2021a&amp;quot; or &amp;quot;2022b&amp;quot;. &lt;br /&gt;
Modules from different stack versions are incompatible  with each other: you cannot load a module from &amp;quot;foss-2019b&amp;quot; and a module from &amp;quot;foss-2022b&amp;quot; simultaneously.&lt;br /&gt;
'''Currently, &amp;quot;2022b&amp;quot; version is the recommended one, with the most up-to-date software built with the most up-to-date compilers.'''&lt;br /&gt;
&lt;br /&gt;
Further on, we just use the placeholder '''ARCH''', replace it with &amp;quot;gomkl&amp;quot;, &amp;quot;intel&amp;quot; or &amp;quot;foss&amp;quot; and &amp;quot;2021a&amp;quot; or &amp;quot;2022b&amp;quot;, depending on which machine you are working on and what stack version you want to use.&lt;br /&gt;
Some modules also have the Python version specified in their names (for example, &amp;quot;''nlpl-numpy-1.24.4-foss-2022b-Python-3.10.8''&amp;quot;).&lt;br /&gt;
For stack version &amp;quot;2021a&amp;quot; it is usually Python 3.9.5, for stack version &amp;quot;2022b&amp;quot; it is usually Python 3.10.8.&lt;br /&gt;
Check the output of the '''module avail''' command for the exact module names.&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;Bundle&amp;quot; modules ===&lt;br /&gt;
These are the modules with the most cryptic names. Each of them contains a bunch of software pieces (Python packages, as a rule).&lt;br /&gt;
Many of these modules will be loaded automatically as dependencies of the regular modules, but sometimes they can be useful themselves.&lt;br /&gt;
They have their own bundle versions: &amp;quot;2022.01&amp;quot; or simply &amp;quot;01&amp;quot;, etc (further specified as '''VERS''').&lt;br /&gt;
Here are the details:&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-python-candy/VERS-ARCH''': various utility packages not directly related to NLP&lt;br /&gt;
** tqdm&lt;br /&gt;
** pydot&lt;br /&gt;
** smart_open&lt;br /&gt;
** cached-property&lt;br /&gt;
** filelock&lt;br /&gt;
** termcolor&lt;br /&gt;
** regex&lt;br /&gt;
** sacremoses&lt;br /&gt;
** mpi4py&lt;br /&gt;
** jsonlines&lt;br /&gt;
** jsonschema&lt;br /&gt;
** typing_extensions&lt;br /&gt;
** packaging&lt;br /&gt;
** termcolor&lt;br /&gt;
** pyhocon&lt;br /&gt;
** blis&lt;br /&gt;
** pathspec&lt;br /&gt;
** hatchling&lt;br /&gt;
** multidict&lt;br /&gt;
** yarl&lt;br /&gt;
** black&lt;br /&gt;
** click&lt;br /&gt;
** plotly&lt;br /&gt;
** toolz&lt;br /&gt;
** ...&lt;br /&gt;
* '''nlpl-nlptools/VERS-ARCH''': various utility packages related to NLP&lt;br /&gt;
** evaluate&lt;br /&gt;
** conllu&lt;br /&gt;
** seqeval&lt;br /&gt;
** langdetect&lt;br /&gt;
** leven&lt;br /&gt;
** lxml&lt;br /&gt;
** portalocker&lt;br /&gt;
** rouge_score&lt;br /&gt;
** sacrebleu&lt;br /&gt;
** udapi&lt;br /&gt;
** word2number&lt;br /&gt;
* '''nlpl-scipy-ecosystem/VERS-ARCH''': everything that constitutes the [https://scipy.org/ SciPy ecosystem]. Too many packages to enumerate them all, but the most important are:&lt;br /&gt;
** scipy&lt;br /&gt;
** pandas&lt;br /&gt;
** matplotlib&lt;br /&gt;
** ipython&lt;br /&gt;
** jupyter_core&lt;br /&gt;
** jupyter_client&lt;br /&gt;
** networkx&lt;br /&gt;
** sympy&lt;br /&gt;
** beautifulsoup4&lt;br /&gt;
** numexpr&lt;br /&gt;
* '''nlpl-llmtools/VERS-ARCH''': various utility packages for working with large language models (LLMs)&lt;br /&gt;
** peft ([https://pypi.org/project/peft/ HuggingFace PEFT]: State-of-the-art Parameter-Efficient Fine-Tuning)&lt;br /&gt;
** promptsource ([https://pypi.org/project/promptsource/ Toolkit for creating, sharing and using natural language prompts])&lt;br /&gt;
** lm-evaluation-harness ([https://github.com/EleutherAI/lm-evaluation-harness EleutherAI Language Model Evaluation Harness])&lt;br /&gt;
** bert_score: [https://pypi.org/project/bert-score/ BERTScore] to evaluate NLG tasks&lt;br /&gt;
** ...&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;Regular&amp;quot; modules ===&lt;br /&gt;
These are more obvious modules, each one gives you one software piece:&lt;br /&gt;
&lt;br /&gt;
==== Most important ====&lt;br /&gt;
* '''nlpl-pytorch/1.6.0-ARCH-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.6.0 (for CUDA 10)&lt;br /&gt;
* '''nlpl-pytorch/1.7.1-ARCH-cuda-10.1.243-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 10)&lt;br /&gt;
* '''nlpl-pytorch/1.7.1-ARCH-cuda-11.1.1-Python-3.7.4''': [https://pytorch.org/ PyTorch] 1.7.1 (for CUDA 11)&lt;br /&gt;
* '''nlpl-pytorch/1.11.0-ARCH-cuda-11.3.1-Python-3.9.5''': [https://pytorch.org/ PyTorch] 1.11.0 (for CUDA 11)&lt;br /&gt;
* '''nlpl-pytorch/2.1.2-ARCH-cuda-12.0.0-Python-3.10.8''': [https://pytorch.org/ PyTorch] 2.1.2 (for CUDA 12)&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-tensorflow/1.15.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 1.15.2 (for CUDA 10)&lt;br /&gt;
* '''nlpl-tensorflow/2.3.2-ARCH-cuda-10.1.243-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.3.2 (for CUDA 10)&lt;br /&gt;
* '''nlpl-tensorflow/2.6.2-ARCH-cuda-11.1.1-Python-3.7.4''': [https://www.tensorflow.org/ TensorFlow] 2.6.2 (for CUDA 11)&lt;br /&gt;
* '''nlpl-tensorflow/2.6.5-ARCH-cuda-11.3.1-Python-3.9.5''': [https://www.tensorflow.org/ TensorFlow] 2.6.5 (for CUDA 11)&lt;br /&gt;
* '''nlpl-tensorflow/2.15.0-ARCH-cuda-12.0.0-Python-3.10.8''': [https://www.tensorflow.org/ TensorFlow] 2.15.0 (for CUDA 12)&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-accelerate/0.13.2-ARCH-Python-3.9.5''': [https://pypi.org/project/accelerate/ Accelerate] 0.13.2&lt;br /&gt;
* '''nlpl-accelerate/0.27.2-ARCH-Python-3.10.8''': [https://pypi.org/project/accelerate/ Accelerate] 0.27.2&lt;br /&gt;
&lt;br /&gt;
==== Others ====&lt;br /&gt;
* '''nlpl-bitsandbytes/VERS-ARCH''': [https://pypi.org/project/bitsandbytes/ BitsAndBytes]&lt;br /&gt;
* '''nlpl-cython/VERS-ARCH''': [http://cython.org/ Cython]&lt;br /&gt;
* '''nlpl-datasets/VERS-ARCH''': [https://github.com/huggingface/datasets HuggingFace Datasets]&lt;br /&gt;
* '''nlpl-gensim/VERS-ARCH''': [https://github.com/RaRe-Technologies/gensim Gensim]&lt;br /&gt;
* '''nlpl-horovod/VERS-ARCH''': [https://github.com/horovod/horovod Horovod]&lt;br /&gt;
* '''nlpl-huggingface-hub/VERS-ARCH-2019b-Python-3.7.4''': [https://pypi.org/project/huggingface-hub/ HuggingFace Hub]&lt;br /&gt;
* '''nlpl-nltk/VERS-ARCH''': [https://www.nltk.org/ NLTK], together with '''all''' the corpora and datasets (no need to download them separately!)&lt;br /&gt;
* '''nlpl-numpy/VERS-ARCH''': [https://numpy.org/ NumPy]&lt;br /&gt;
* '''nlpl-pytorch-lightning/VERS-ARCH-cuda-11.3.1''': [https://www.pytorchlightning.ai/ PyTorch Lightning] &lt;br /&gt;
* '''nlpl-scikit-bundle/VERS-ARCH''': [https://scikit-learn.org/ Scikit-Learn]&lt;br /&gt;
* '''nlpl-sentencepiece/VERS-ARCH''': [https://github.com/google/sentencepiece SentencePiece]&lt;br /&gt;
* '''nlpl-sentence-transformers/VERS-ARCH''': [https://sbert.net SentenceTransformers]&lt;br /&gt;
* '''nlpl-simple_elmo/VERS-ARCH''': [https://pypi.org/project/simple-elmo/ Simple_elmo]&lt;br /&gt;
* '''nlpl-stanza/VERS-ARCH''': [https://stanfordnlp.github.io/stanza/ Stanza]&lt;br /&gt;
* '''nlpl-tensorboard/VERS-ARCH''': [https://github.com/tensorflow/tensorboard TensorBoard]&lt;br /&gt;
* '''nlpl-tokenizers/VERS-ARCH''': [https://github.com/huggingface/tokenizers HuggingFace Tokenizers]&lt;br /&gt;
* '''nlpl-torch-geometric/VERS-ARCH''': [https://pyg.org/ PyTorch Geometric]&lt;br /&gt;
* '''nlpl-torchmetrics/VERS-ARCH''': [https://pypi.org/project/torchmetrics/ TorchMetrics]&lt;br /&gt;
* '''nlpl-torchtext/VERS-ARCH''': [https://pypi.org/project/torchtext/ TorchText]&lt;br /&gt;
* '''nlpl-transformers/VERS-ARCH''': [https://huggingface.co/transformers/ HuggingFace Transformers]&lt;br /&gt;
* '''nlpl-trl/VERS-ARCH''': [https://huggingface.co/docs/trl/index HuggingFace Transformer Reinforcement Learning]&lt;br /&gt;
* '''nlpl-wandb/VERS-ARCH''': [https://pypi.org/project/wandb/ Weights and Biases (wandb)]&lt;br /&gt;
* '''nlpl-warc2text/VERS-ARCH''': [https://github.com/bitextor/warc2text warc2text]&lt;br /&gt;
&lt;br /&gt;
* '''nlpl-dllogger/0.1.0-ARCH-2019b-Python-3.7.4''': [https://github.com/NVIDIA/dllogger DLLogger] 0.1.0 ''(status unclear)''&lt;br /&gt;
* '''nlpl-nvidia-bert/20.06.8-ARCH-2019b-tensorflow-1.15.2-Python-3.7.4''': [https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT NVIDIA's BERT implementation] for TensorFlow 1 ''(status unclear)''&lt;br /&gt;
&lt;br /&gt;
= Source =&lt;br /&gt;
&lt;br /&gt;
Currently, the virtual laboratory is generated using EasyBuild, all the code and easyconfigs available [https://source.coderefinery.org/nlpl/easybuild in this repository].&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1780</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1780"/>
		<updated>2025-02-24T14:05:14Z</updated>

		<summary type="html">&lt;p&gt;Andreku: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:Winter school 2025.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – Stories from the Trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt1.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;br&amp;gt;FineWeb2 is a recent multilingual web based dataset for large language model (LLM) pretraining, that produces better-performing LLMs than other popular datasets. In this talk, we discuss in depth the many challenges involved in adapting processing pipelines commonly used for English data to over 1000 languages, including evaluation task selection for ablation experiments, language identification, filtering, and deduplication.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/FineWeb2_90min.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Gema-Ramírez-HPLT-Winter-School-2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Open_Foundation_Models_Scaling_Laws-pre_final_2024.pdf Slides 1]&amp;lt;br&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Pitfalls_in_measuring_generalization.pdf Slides 2]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&amp;lt;/br&amp;gt;&lt;br /&gt;
Stephan Oepen, Nikolay Arefev, Farrokh Mehryary, Elaine Zosa, Vladislav Mikhailov, David Samuel&amp;lt;/br&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/evening.pdf Slides]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün (online) &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&amp;lt;br&amp;gt;Post-training is a crucial step for building state-of-the-art LLMs and aligning them according to human preferences. Although many public post-training datasets are available, they are predominantly curated for English, and multilingual datasets are extremely scarce. This lecture will cover methods for collecting high-quality post-training datasets such as human annotation, multilingual templates, and synthetic data generation. We will also complement methods for high-quality data collection with post-training recipes from Aya-101, Aya-23, and recently released Aya Expanse models,  to leverage the curated data best.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/HPLT_Winter_School_Aya.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt2.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=File:Winter_school_2025.jpg&amp;diff=1779</id>
		<title>File:Winter school 2025.jpg</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=File:Winter_school_2025.jpg&amp;diff=1779"/>
		<updated>2025-02-24T14:04:51Z</updated>

		<summary type="html">&lt;p&gt;Andreku: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1777</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1777"/>
		<updated>2025-02-08T23:19:48Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt1.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;br&amp;gt;FineWeb2 is a recent multilingual web based dataset for large language model (LLM) pretraining, that produces better-performing LLMs than other popular datasets. In this talk, we discuss in depth the many challenges involved in adapting processing pipelines commonly used for English data to over 1000 languages, including evaluation task selection for ablation experiments, language identification, filtering, and deduplication.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/FineWeb2_90min.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Gema-Ramírez-HPLT-Winter-School-2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Open_Foundation_Models_Scaling_Laws-pre_final_2024.pdf Slides 1]&amp;lt;br&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Pitfalls_in_measuring_generalization.pdf Slides 2]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&amp;lt;/br&amp;gt;&lt;br /&gt;
Stephan Oepen, Nikolay Arefev, Farrokh Mehryary, Elaine Zosa, Vladislav Mikhailov, David Samuel&amp;lt;/br&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/evening.pdf Slides]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün (online) &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&amp;lt;br&amp;gt;Post-training is a crucial step for building state-of-the-art LLMs and aligning them according to human preferences. Although many public post-training datasets are available, they are predominantly curated for English, and multilingual datasets are extremely scarce. This lecture will cover methods for collecting high-quality post-training datasets such as human annotation, multilingual templates, and synthetic data generation. We will also complement methods for high-quality data collection with post-training recipes from Aya-101, Aya-23, and recently released Aya Expanse models,  to leverage the curated data best.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/HPLT_Winter_School_Aya.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt2.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1775</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1775"/>
		<updated>2025-02-05T09:04:43Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt1.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;br&amp;gt;FineWeb2 is a recent multilingual web based dataset for large language model (LLM) pretraining, that produces better-performing LLMs than other popular datasets. In this talk, we discuss in depth the many challenges involved in adapting processing pipelines commonly used for English data to over 1000 languages, including evaluation task selection for ablation experiments, language identification, filtering, and deduplication.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/FineWeb2_90min.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Gema-Ramírez-HPLT-Winter-School-2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Open_Foundation_Models_Scaling_Laws-pre_final_2024.pdf Slides 1]&amp;lt;br&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Pitfalls_in_measuring_generalization.pdf Slides 2]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün (online) &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&amp;lt;br&amp;gt;Post-training is a crucial step for building state-of-the-art LLMs and aligning them according to human preferences. Although many public post-training datasets are available, they are predominantly curated for English, and multilingual datasets are extremely scarce. This lecture will cover methods for collecting high-quality post-training datasets such as human annotation, multilingual templates, and synthetic data generation. We will also complement methods for high-quality data collection with post-training recipes from Aya-101, Aya-23, and recently released Aya Expanse models,  to leverage the curated data best.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/HPLT_Winter_School_Aya.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt2.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1774</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1774"/>
		<updated>2025-02-05T07:56:13Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt1.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;br&amp;gt;FineWeb2 is a recent multilingual web based dataset for large language model (LLM) pretraining, that produces better-performing LLMs than other popular datasets. In this talk, we discuss in depth the many challenges involved in adapting processing pipelines commonly used for English data to over 1000 languages, including evaluation task selection for ablation experiments, language identification, filtering, and deduplication.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/FineWeb2_90min.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Gema-Ramírez-HPLT-Winter-School-2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Open_Foundation_Models_Scaling_Laws-pre_final_2024.pdf Slides 1]&amp;lt;br&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Pitfalls_in_measuring_generalization.pdf Slides 2]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün (online) &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&amp;lt;br&amp;gt;Post-training is a crucial step for building state-of-the-art LLMs and aligning them according to human preferences. Although many public post-training datasets are available, they are predominantly curated for English, and multilingual datasets are extremely scarce. This lecture will cover methods for collecting high-quality post-training datasets such as human annotation, multilingual templates, and synthetic data generation. We will also complement methods for high-quality data collection with post-training recipes from Aya-101, Aya-23, and recently released Aya Expanse models,  to leverage the curated data best.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/HPLT_Winter_School_Aya.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1773</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1773"/>
		<updated>2025-02-05T07:55:14Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt1.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;br&amp;gt;FineWeb2 is a recent multilingual web based dataset for large language model (LLM) pretraining, that produces better-performing LLMs than other popular datasets. In this talk, we discuss in depth the many challenges involved in adapting processing pipelines commonly used for English data to over 1000 languages, including evaluation task selection for ablation experiments, language identification, filtering, and deduplication.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/FineWeb2_90min.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Gema-Ramírez-HPLT-Winter-School-2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Open_Foundation_Models_Scaling_Laws-pre_final_2024.pdf Slides 1]&amp;lt;br&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Pitfalls_in_measuring_generalization.pdf Slides 2]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün (online) &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&amp;lt;br&amp;gt;Post-training is a crucial step for building state-of-the-art LLMs and aligning them according to human preferences. Although many public post-training datasets are available, they are predominantly curated for English, and multilingual datasets are extremely scarce. This lecture will cover methods for collecting high-quality post-training datasets such as human annotation, multilingual templates, and synthetic data generation. We will also complement methods for high-quality data collection with post-training recipes from Aya-101, Aya-23, and recently released Aya Expanse models,  to leverage the curated data best.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/HPLT_Winter_School_Aya.pdf]&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1772</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1772"/>
		<updated>2025-02-04T16:32:06Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt1.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;br&amp;gt;FineWeb2 is a recent multilingual web based dataset for large language model (LLM) pretraining, that produces better-performing LLMs than other popular datasets. In this talk, we discuss in depth the many challenges involved in adapting processing pipelines commonly used for English data to over 1000 languages, including evaluation task selection for ablation experiments, language identification, filtering, and deduplication.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/FineWeb2_90min.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Gema-Ramírez-HPLT-Winter-School-2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Open_Foundation_Models_Scaling_Laws-pre_final_2024.pdf Slides 1]&amp;lt;br&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Pitfalls_in_measuring_generalization.pdf Slides 2]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün (online) &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&amp;lt;br&amp;gt;Post-training is a crucial step for building state-of-the-art LLMs and aligning them according to human preferences. Although many public post-training datasets are available, they are predominantly curated for English, and multilingual datasets are extremely scarce. This lecture will cover methods for collecting high-quality post-training datasets such as human annotation, multilingual templates, and synthetic data generation. We will also complement methods for high-quality data collection with post-training recipes from Aya-101, Aya-23, and recently released Aya Expanse models,  to leverage the curated data best.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1771</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1771"/>
		<updated>2025-02-04T14:37:03Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt1.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;br&amp;gt;FineWeb2 is a recent multilingual web based dataset for large language model (LLM) pretraining, that produces better-performing LLMs than other popular datasets. In this talk, we discuss in depth the many challenges involved in adapting processing pipelines commonly used for English data to over 1000 languages, including evaluation task selection for ablation experiments, language identification, filtering, and deduplication.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/FineWeb2_90min.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Gema-Ramírez-HPLT-Winter-School-2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün (online) &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&amp;lt;br&amp;gt;Post-training is a crucial step for building state-of-the-art LLMs and aligning them according to human preferences. Although many public post-training datasets are available, they are predominantly curated for English, and multilingual datasets are extremely scarce. This lecture will cover methods for collecting high-quality post-training datasets such as human annotation, multilingual templates, and synthetic data generation. We will also complement methods for high-quality data collection with post-training recipes from Aya-101, Aya-23, and recently released Aya Expanse models,  to leverage the curated data best.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1770</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1770"/>
		<updated>2025-02-04T08:16:41Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt1.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;br&amp;gt;FineWeb2 is a recent multilingual web based dataset for large language model (LLM) pretraining, that produces better-performing LLMs than other popular datasets. In this talk, we discuss in depth the many challenges involved in adapting processing pipelines commonly used for English data to over 1000 languages, including evaluation task selection for ablation experiments, language identification, filtering, and deduplication.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/FineWeb2_90min.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/Gema-Ramírez-HPLT-Winter-School-2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1769</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1769"/>
		<updated>2025-02-03T21:48:48Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt1.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;br&amp;gt;FineWeb2 is a recent multilingual web based dataset for large language model (LLM) pretraining, that produces better-performing LLMs than other popular datasets. In this talk, we discuss in depth the many challenges involved in adapting processing pipelines commonly used for English data to over 1000 languages, including evaluation task selection for ablation experiments, language identification, filtering, and deduplication.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/FineWeb2_90min.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1768</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1768"/>
		<updated>2025-02-03T15:38:34Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt1.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;br&amp;gt;FineWeb2 is a recent multilingual web based dataset for large language model (LLM) pretraining, that produces better-performing LLMs than other popular datasets. In this talk, we discuss in depth the many challenges involved in adapting processing pipelines commonly used for English data to over 1000 languages, including evaluation task selection for ablation experiments, language identification, filtering, and deduplication.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1767</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1767"/>
		<updated>2025-02-03T14:54:14Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/nlpl_rogers_pt1.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1766</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1766"/>
		<updated>2025-02-03T14:26:38Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts about LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture critically examines a set of common claims about the modern LLMs, including the claims of their high performance, robustness, general-purpose technology status, and &amp;quot;emergent properties&amp;quot;. I will also re-examine the &amp;quot;bitter lesson&amp;quot; as applied to LLMs, and its implications for the future of the field.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1765</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1765"/>
		<updated>2025-02-03T14:14:06Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Programme */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1764</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1764"/>
		<updated>2025-02-03T14:11:29Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM – A language model for Europe'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM and FinLLM – stories from the trenches'''&amp;lt;br&amp;gt;&lt;br /&gt;
In this talk, we share our experiences building two large language models: EuroLLM, a multilingual model designed to serve the diverse linguistic and cultural landscape of Europe, and FinLLM, a financial LLM tailored for the UK’s highly specialized finance industry with our partners Aveni.ai, Lloyds, and Nationwide. We will discuss the challenges of curating high-quality training data: data mixes, cleaning pipelines training recipes and also at creating meaningful benchmarks.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/2025-02-EuroLLM_and_FinLLM_Birch.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1763</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1763"/>
		<updated>2025-02-03T13:27:57Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM – A language model for Europe'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
[https://data.hplt-project.org/transfer/commoncrawl_2025.pdf Slides]&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM – A language model for Europe'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1762</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1762"/>
		<updated>2025-02-03T13:21:33Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM – A language model for Europe'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM – A language model for Europe'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Having a look at pretraining data through the stats glass'''&amp;lt;br&amp;gt;At the moment of speaking, zillions of tokens of pretraining data are being collected and curated to train LLMs by several initiatives, all aiming at gathering the best set to get the best model performance. These curated datasets are huge and in many cases multilingual, making the smallest evaluation task an enormous task. But we can always ask stats for help, and data will confess. In this session we will have a look at several pretraining (textual) datasets through the stats glass, and see together what are the ups and downs revealed by it.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1761</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1761"/>
		<updated>2025-02-03T11:12:00Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM – A language model for Europe'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&amp;lt;br&amp;gt;&lt;br /&gt;
Common Crawl is a free, open repository of web crawl data that can be used by anyone, crawled since 2008. Throughout the years the foundation has focused on achieving a balance in a diversity and representative sample of web sites while operating an efficient and polite crawler. In recent years, with the advent of LLMs and multimodal models, the interest in obtaining large amounts of high quality data has skyrocketed, while also raising concerns about the ethical considerations of large scale data curation. After a quick introduction into the history of the Common Crawl Foundation, we present our recent efforts to respond to this new data requirements while also expanding the language and cultural coverage of our dataset, and addressing the practical and ethical questions that have arisen around web crawling in the era of LLMs.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''LLMs and Factuality: facts from LLMs'''&amp;lt;br&amp;gt;&lt;br /&gt;
This lecture focuses on the workflows for using LLMs as information sources, the types of problems that may result from that, and the main current mitigation strategies (RAG and CoT). Finally, I will discuss the problem of detecting generated texts, and the impact of LLMs on the information ecosphere and content economy.&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch &amp;lt;p  class=&amp;quot;mw-collapsible mw-collapsed&amp;quot;&amp;gt;'''EuroLLM – A language model for Europe'''&amp;lt;/p&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1760</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1760"/>
		<updated>2025-02-03T10:55:51Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM – A language model for Europe'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 19:20 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1759</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1759"/>
		<updated>2025-02-02T20:57:38Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Participants */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM – A language model for Europe'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 18:50 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1758</id>
		<title>Community/training</title>
		<link rel="alternate" type="text/html" href="https://wiki.nlpl.eu/index.php?title=Community/training&amp;diff=1758"/>
		<updated>2025-01-31T16:16:51Z</updated>

		<summary type="html">&lt;p&gt;Andreku: /* Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''HPLT &amp;amp; NLPL 2025 Winter School on Pretraining Data Quality and Multilingual LLM Evaluation'''&lt;br /&gt;
&lt;br /&gt;
[[File:HPLT and NLPL Winter School 2024.jpg|center|thumb|upright=2.0]]&lt;br /&gt;
&lt;br /&gt;
= Background =&lt;br /&gt;
&lt;br /&gt;
Since 2023, the NLPL network and Horizon Europe&lt;br /&gt;
project ''[https://hplt-project.org High-Performance Language Technologies]'' (HPLT)&lt;br /&gt;
have joined forces to organize the successful winter school series on Web-scale NLP.&lt;br /&gt;
The winter school seeks to stimulate ''community formation'',&lt;br /&gt;
i.e. strengthening interaction and collaboration among&lt;br /&gt;
European research teams in NLP and advancing a shared level of knowledge&lt;br /&gt;
and experience in using high-performance e-infrastructures for large-scale&lt;br /&gt;
NLP research.&lt;br /&gt;
This 2025 edition of the winter school puts special emphasis on&lt;br /&gt;
NLP researchers from countries who participate in the EuroHPC&lt;br /&gt;
[https://www.lumi-supercomputer.eu/lumi-consortium/ LUMI consortium].&lt;br /&gt;
For additional background, please see the archival pages from the&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2018 2018],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2019 2019],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2020 2020],&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2023 2023], and&lt;br /&gt;
[https://wiki.nlpl.eu/index.php/Community/training/2024 2024]&lt;br /&gt;
NLPL Winter Schools.&lt;br /&gt;
&lt;br /&gt;
For early 2025, HPLT will hold its winter school from Monday, February 3, to&lt;br /&gt;
Wednesday, February 5, 2025, at a&lt;br /&gt;
[https://www.thonhotels.com/our-hotels/norway/skeikampen/ mountain-side hotel]&lt;br /&gt;
(with skiing and walking opportunities) about two hours north of Oslo.&lt;br /&gt;
The project will organize group bus transfer from and to the Oslo&lt;br /&gt;
airport ''Gardermoen'', leaving the airport at 9:45 on Monday morning&lt;br /&gt;
and returning there around 17:30 on Wednesday afternoon.&lt;br /&gt;
&lt;br /&gt;
The winter school is subsidized by the HPLT project: there is no fee for&lt;br /&gt;
participants and no charge for the bus transfer to and from the&lt;br /&gt;
conference hotel.&lt;br /&gt;
All participants will have to cover their own travel and accommodation&lt;br /&gt;
at Skeikampen, however.&lt;br /&gt;
Two nights at the hotel, including all meals, will come to NOK 3855 (NOK 3455 per person in a shared double room), &lt;br /&gt;
to be paid to the hotel directly upon arrival.&lt;br /&gt;
&lt;br /&gt;
= Programme =&lt;br /&gt;
&lt;br /&gt;
The 2025 winter school will have a thematic focus on ''Pretraining Data Quality and Multilingual LLM Evaluation''.&lt;br /&gt;
The programme will be comprised of in-depth technical presentations (possibly including some&lt;br /&gt;
hands-on elements) by seasoned experts, with special emphasis on open science and European languages, &lt;br /&gt;
but also include critical reflections on current development trends in LLM-focussed NLP.&lt;br /&gt;
The programme will be complemented with a ‘walk-through’ of example experience&lt;br /&gt;
reports on the shared EuroHPC LUMI supercomputer.&lt;br /&gt;
&lt;br /&gt;
Confirmed presenters and talks include:&lt;br /&gt;
&lt;br /&gt;
* [https://sites.google.com/view/alexandra-birch Alexandra Birch], University of Edinburgh&amp;lt;/br&amp;gt;'''EuroLLM – A language model for Europe'''&lt;br /&gt;
* [https://laion.ai/team/ Jenia Jitsev] and [https://laion.ai/team/ Marianna Nezhurina], Jülich Supercomputing Centre / LAION&amp;lt;/br&amp;gt;'''Open Foundation Models: Scaling Laws and Generalization'''&lt;br /&gt;
* [https://huggingface.co/guipenedo Guilherme Penedo], Huggingface&amp;lt;/br&amp;gt;'''FineWeb2: Creating a Large Multilingual Dataset for LLM Pre-Training'''&lt;br /&gt;
* [https://scholar.google.com/citations?user=f5FSgPwAAAAJ&amp;amp;hl=en Gema Ramírez-Sánchez], Prompsit Language Engineering&amp;lt;/br&amp;gt;'''A look at Pre-Training Data through the Stats Glass'''  &lt;br /&gt;
* [https://annargrs.github.io Anna Rogers], IT University of Copenhagen&amp;lt;/br&amp;gt;'''Large Language Models and Factuality'''&lt;br /&gt;
* [https://portizs.eu Pedro Ortiz Suarez] and [https://commoncrawl.org/team/sebastian-nagel-engineer Sebastian Nagel], Common Crawl&amp;lt;/br&amp;gt;'''Data Quality, Language Coverage and Ethical Considerations in Web Crawling'''&lt;br /&gt;
* [https://scholar.google.com.tr/citations?user=fvotcRIAAAAJ&amp;amp;hl=tr Ahmet Üstün], Cohere AI&amp;lt;/br&amp;gt;'''Recipe for multilingual post-training: How to collect high-quality data and use them?'''&lt;br /&gt;
&lt;br /&gt;
= Schedule =&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Monday, February 3, 2025&lt;br /&gt;
|-&lt;br /&gt;
| 13:00 || 14:00 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 14:00 || 15:30 || '''Session 1''' Pedro Ortiz Suarez &amp;amp; Sebastian Nagel&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 15:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 16:00 || 17:30 || '''Session 2''' Anna Rogers&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 || 17:50 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:50 || 19:20 || '''Session 3''' Alexandra Birch&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Tuesday, February 4, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3 | Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 09:00 || 10:30 || '''Session 4''' Guilherme Penedo&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Free time (Lunch is available between 13:00 and 14:30)&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 || 17:00 || '''Session 5''' Gema Ramírez-Sánchez&lt;br /&gt;
|-&lt;br /&gt;
| 17:00 || 17:20 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 17:20 || 18:50 || '''Session 6''' Jenia Jitsev &amp;amp; Marianna Nezhurina&lt;br /&gt;
|-&lt;br /&gt;
| 19:30 ||  || Dinner&lt;br /&gt;
|-&lt;br /&gt;
| 21:00 || || '''Evening Session: Findings from HPLT'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!colspan=3|Wednesday, February 5, 2025&lt;br /&gt;
|-&lt;br /&gt;
|colspan=3| Breakfast is available from 07:30&lt;br /&gt;
|-&lt;br /&gt;
| 08:30 || 10:00 || '''Session 8''' Ahmet Üstün&lt;br /&gt;
|-&lt;br /&gt;
| 10:00 || 10:30 || Coffee Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:30 || 12:00 || '''Session 9''' Anna Rogers&lt;br /&gt;
|-&lt;br /&gt;
| 12:30 || 13:30 || Lunch&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 || 16:45 || Bus transfer to OSL Airport&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Registration =&lt;br /&gt;
&lt;br /&gt;
In total, this year we welcome 62 participants at the 2025 winter school.&lt;br /&gt;
The winter school is [https://nettskjema.no/a/381438 over-subscribed] and no longer accepting registrations.&lt;br /&gt;
We have processed requests for participation on a first-come, first-served basis, with an eye toward regional balance.&lt;br /&gt;
Interested parties who had submitted the registration form have been confirmed in three batches, on '''December 6''', on '''December 13''',&lt;br /&gt;
and on '''December 20''', which was also the closing date for winter school registration.&lt;br /&gt;
&lt;br /&gt;
Once confirmed by the organizing team, participant names are published&lt;br /&gt;
on this page, and registration establishes a&lt;br /&gt;
''binding agreement'' with the hotel.&lt;br /&gt;
Therefore, a cancellation fee will be incurred (unless we can find someone else to ‘take over’ last-minute&lt;br /&gt;
spaces), and no-shows will be charged the full price for at least one night&lt;br /&gt;
by the hotel.&lt;br /&gt;
&lt;br /&gt;
= Logistics = &lt;br /&gt;
&lt;br /&gt;
With a few exceptions, winter school participants travel to and from the conference hotel&lt;br /&gt;
jointly on a chartered bus (the HPLT shuttle).&lt;br /&gt;
The bus will leave OSL airport no later than 9:45 CET on Monday, February 3.&lt;br /&gt;
Thus, please meet up by 9:30 and make your arrival known to your assigned&lt;br /&gt;
‘tour guide’ (who will introduce themselves to you by email beforehand).&lt;br /&gt;
&lt;br /&gt;
The group will gather near the DNB currency exchange booth in the downstairs&lt;br /&gt;
arrivals area, just outside the international arrivals luggage claims and slightly&lt;br /&gt;
to the left as one exits the customs area:&lt;br /&gt;
the yellow dot numbered (18) on the&lt;br /&gt;
[https://avinor.no/globalassets/_oslo-lufthavn/ankomst-arrivals.pdf OSL arrivals map].&lt;br /&gt;
The group will then walk over to the bus terminal, to leave the airport not long after 9:40.&lt;br /&gt;
The drive to the Skeikampen conference hotel will take us about three hours, and the bus&lt;br /&gt;
will make one stop along the way to stretch our legs and fill up on coffee.&lt;br /&gt;
&lt;br /&gt;
The winter school will end with lunch on Wednesday, February 5, before the group returns&lt;br /&gt;
to OSL airport on the HPLT shuttle.&lt;br /&gt;
The bus will leave Skeikampen at 14:00 CET, with an expected arrival time at OSL&lt;br /&gt;
around 17:00 to 17:30 CET. After stopping at the OSL airport, the bus will continue to central Oslo.&lt;br /&gt;
&lt;br /&gt;
= Organization =&lt;br /&gt;
&lt;br /&gt;
The 2025 Winter School is organized by a team of volunteers at the University&lt;br /&gt;
of Oslo, supported by a programme committee from the HPLT and NLPL network and beyond,&lt;br /&gt;
please see below.&lt;br /&gt;
For all inquiries regarding registration, the programme, logistics,&lt;br /&gt;
or such, please contact &amp;lt;code&amp;gt;hplt-training@ifi.uio.no&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The programme committee is comprised of:&lt;br /&gt;
&lt;br /&gt;
* Barry Haddow (University of Edinburgh, UK)&lt;br /&gt;
* Andrey Kutuzov (University of Oslo, Norway)&lt;br /&gt;
* Stephan Oepen (University of Oslo, Norway)&lt;br /&gt;
* Sampo Pyysalo (University of Turku, Finland)&lt;br /&gt;
* Jörg Tiedemann (University of Helsinki, Finland)&lt;br /&gt;
&lt;br /&gt;
= Participants =&lt;br /&gt;
&lt;br /&gt;
# Nikolay Arefev, University of Oslo (Norway)&lt;br /&gt;
# Maria	Barrett, Silo AI (Finland)&lt;br /&gt;
# Toms Bergmanis, Tilde (Latvia)&lt;br /&gt;
# Alexandra Birch, University of Edinburgh (UK)&lt;br /&gt;
# Laurie Burchell, University of Edinburgh (UK)&lt;br /&gt;
# Lucas Charpentie, University of Oslo (Norway)&lt;br /&gt;
# Pinzhen (Patrick) Chen, University of Edinburgh (UK)&lt;br /&gt;
# Hannah Clausen, University of Oslo (Norway)&lt;br /&gt;
# Lucia Domenichelli, University of Pisa (Italy)&lt;br /&gt;
# Aleksei Dorkin, University of Tartu (Estonia)&lt;br /&gt;
# Kenneth Enevoldsen, Aarhus University (Denmark)&lt;br /&gt;
# Tita Enstad, National Library (Norway)&lt;br /&gt;
# Mariia Fedorova, University of Oslo (Norway)&lt;br /&gt;
# Yanzhu Guo, INRIA Paris (France)&lt;br /&gt;
# Arzu Burcu Güven, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Barry Haddow, University of Edinburgh (UK)&lt;br /&gt;
# Jan Hajič, Charles University (Czech Republic)&lt;br /&gt;
# Jindřich Helcl, Charles University (Czech Republic)&lt;br /&gt;
# Bertram Højer, IT University Copenhagen (Denmark)&lt;br /&gt;
# Sekh Mainul Islam, University of Copenhagen (Denmark)&lt;br /&gt;
# Jenia Jitsev, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Márton Kardos, Aarhus University (Denmark)&lt;br /&gt;
# Anastasiia Klimashevskaia, University of Bergen (Norway)&lt;br /&gt;
# Mateusz Klimaszewski, The University of Edinburgh (UK)&lt;br /&gt;
# Ville Komulainen, University of Turku (Finland)&lt;br /&gt;
# Markus Koskela, CSC – IT Center for Science (Finland)&lt;br /&gt;
# Martins Kronis, Tilde (Latvia)&lt;br /&gt;
# Vimal Kumar Kumar, University of Limerick (Ireland)&lt;br /&gt;
# Andrey Kutuzov, University of Oslo (Norway)&lt;br /&gt;
# Hengyu Luo, University of Helsinki (Finland)&lt;br /&gt;
# Farrokh Mehryary, University of Turku (Finland)&lt;br /&gt;
# Vladislav Mikhailov, University of Oslo (Norway)&lt;br /&gt;
# Andreas Motzfeldt, IT University of Copenhagen (Denmark)&lt;br /&gt;
# Zain Muhammad Mujahid, University of Copenhagen (Denmark)&lt;br /&gt;
# Sebastian Nagel, Common Crawl Foundation (Germany)&lt;br /&gt;
# Marianna Nezhurina, Jülich Supercomputing Centre / LAION (Germany)&lt;br /&gt;
# Stephan Oepen, University of Oslo (Norway)&lt;br /&gt;
# Guilherme Penedo, HuugingFace (France)&lt;br /&gt;
# Irina Proskurina, University of Lyon (France)&lt;br /&gt;
# Taido Purason, University of Tartu (Estonia)&lt;br /&gt;
# Marie Roald, National Library (Norway)&lt;br /&gt;
# Anna Rogers, IT University Copenhagen (Denmark)&lt;br /&gt;
# Ismaël Rousseau, Orange (France)&lt;br /&gt;
# David Samuel, University of Oslo (Norway)&lt;br /&gt;
# Gema Ramírez Sánchez, Prompsit Language Engineering (Spain)&lt;br /&gt;
# Marta Sartor, University of Pisa (Italy)&lt;br /&gt;
# Ipek Baris Schlicht, Universitat Politècnica de València (Spain)&lt;br /&gt;
# Hanna Shcharbakova, University of Lorraine (France)&lt;br /&gt;
# Étienne Simon,  University of Oslo (Norway)&lt;br /&gt;
# Pavel Stepachev, The University of Edinburgh (UK)&lt;br /&gt;
# Pedro Ortiz Suarez, Common Crawl Foundation (France)&lt;br /&gt;
# Otto Tarkka, University of Turku (Finland)&lt;br /&gt;
# Kushal Tatariya, KU Leuven (Belgium)&lt;br /&gt;
# Jörg Tiedemann, University of Helsinki (Finland)&lt;br /&gt;
# Samia Touileb, University of Bergen (Norway)&lt;br /&gt;
# Elke Vandermeerschen, KU Leuven (Belgium)&lt;br /&gt;
# Raul Vazquez, University of Helsinki (Finland)&lt;br /&gt;
# Ramón Carreño	Villar, University of Oslo (Norway)&lt;br /&gt;
# Fedor Vitiugin, Aalto University (Finland)&lt;br /&gt;
# Tea Vojtěchová, Charles University (Czech Republic)&lt;br /&gt;
# Artūrs Znotiņš, IMCS at University of Latvia (Latvia)&lt;br /&gt;
# Elaine Zosa, Silo AI (Finland)&lt;/div&gt;</summary>
		<author><name>Andreku</name></author>
		
	</entry>
</feed>