Difference between revisions of "Infrastructure/software/nltk"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Created page with "= Bakckground = = Usage on Abel = = Available Versions = = Installation on Abel = <pre> module purge module load python3/3.5.0 </pre> <pre> mkdir /projects/nlpl/software/...")
 
(Bakckground)
Line 1: Line 1:
= Bakckground =
+
= Background =
 +
 
 +
The
 +
[https://www.nltk.org/ Natural Language Toolkit] (NLTK) provides a large collection
 +
of core NLP utilities (e.g. sentence splitting and tokenization, part of speech
 +
tagging, various approaches to parsing, and many more) in an integrated Python environment.
 +
The NLTK distribution also bundles a broad range of common, freely available data sets,
 +
which are made accessible through a uniform API.
 +
Albeit often neither quite state of the art nor blindingly efficient, NLTK is
 +
popular as a teaching environment and go-to repository of common ‘basic’
 +
preprocessing tasks, e.g. sentence splitting, stop word removal, or lemmatization
 +
(for English, at least).
  
 
= Usage on Abel =
 
= Usage on Abel =

Revision as of 18:38, 29 September 2018

Background

The Natural Language Toolkit (NLTK) provides a large collection of core NLP utilities (e.g. sentence splitting and tokenization, part of speech tagging, various approaches to parsing, and many more) in an integrated Python environment. The NLTK distribution also bundles a broad range of common, freely available data sets, which are made accessible through a uniform API. Albeit often neither quite state of the art nor blindingly efficient, NLTK is popular as a teaching environment and go-to repository of common ‘basic’ preprocessing tasks, e.g. sentence splitting, stop word removal, or lemmatization (for English, at least).

Usage on Abel

Available Versions

Installation on Abel

module purge
module load python3/3.5.0
mkdir /projects/nlpl/software/nltk
virtualenv /projects/nlpl/software/nltk/3.3

Next, we need to create a module definition, in this case /projects/nlpl/software/modulefiles/nlpl-nltk/3.3; make sure to establish the environment variable $NLTK_DATA, pointing to the data sub-directory of the NLTK tree, as established by the command-line data download below.

module load nlpl-nltk/3.3
pip install --upgrade pip
pip install --upgrade $(pip list | tail -n +3 | gawk '{print $1}')

Finally, install the NLTK code and all data packages.

pip install nltk
python -m nltk.downloader -d /projects/nlpl/software/nltk/3.3/data all