Corpora/notes/plan

From Nordic Language Processing Laboratory
(Difference between revisions)
Jump to: navigation, search
(Created page with "= Background = This document provides internal notes (which can be ‘rough’ or out-of-date) to Task Force (DE), i.e. the combination of activities (D) on very large corpor...")
 
(Work Plan)
 
Line 5: Line 5:
 
= Work Plan =
 
= Work Plan =
  
According to the initial [NLPL work plan https://wiki.neic.no/wiki/File:20161220_NeIC_NLPL_workplan_approved.pdf], two
+
According to the initial [https://wiki.neic.no/wiki/File:20161220_NeIC_NLPL_workplan_approved.pdf NLPL work plan], two
 
strands of corpora-related activities were planned, one seeking to enable project-wide access to licensed
 
strands of corpora-related activities were planned, one seeking to enable project-wide access to licensed
 
corpora (e.g. the GigaWord collections from the LDC), and another one aiming to simplify the creation of
 
corpora (e.g. the GigaWord collections from the LDC), and another one aiming to simplify the creation of
 
and access to large text collections derived from Wikipedia and the Common Crawl
 
and access to large text collections derived from Wikipedia and the Common Crawl

Latest revision as of 12:52, 26 October 2017

[edit] Background

This document provides internal notes (which can be ‘rough’ or out-of-date) to Task Force (DE), i.e. the combination of activities (D) on very large corpora and (E) on word embeddings.

[edit] Work Plan

According to the initial NLPL work plan, two strands of corpora-related activities were planned, one seeking to enable project-wide access to licensed corpora (e.g. the GigaWord collections from the LDC), and another one aiming to simplify the creation of and access to large text collections derived from Wikipedia and the Common Crawl

Personal tools
Namespaces

Variants
Actions
Navigation
Tools