Difference between revisions of "Infrastructure/software/spacy"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Available Versions)
(Usage)
Line 16: Line 16:
  
 
<pre>
 
<pre>
 +
module use -a /proj*/nlpl/software/modulefiles
 
module load nlpl-nltk
 
module load nlpl-nltk
 
</pre>
 
</pre>
Line 32: Line 33:
 
dynamic libraries and support for both cpu and gpu nodes, it is important
 
dynamic libraries and support for both cpu and gpu nodes, it is important
 
for them to be activated last, i.e. on the ‘top’ of a multi-module stack.
 
for them to be activated last, i.e. on the ‘top’ of a multi-module stack.
 
  
 
= Versions =
 
= Versions =

Revision as of 08:07, 1 October 2018

Background

The SpaCy library supports a range of ‘basic’ NLP tasks, including sentence splitting, tokenization, tagging and lemmatization, and dependency parsing—for about half a dozen European languages. SpaCy is somewhat similar on the surface to the Natural Language Toolkit (NLTK) but prides itself of both higher-quality analysis and better computational efficiency.


Usage

The module nlpl-nltk provides a SpaCy installation in a Python 3.5 virtual environment.

module use -a /proj*/nlpl/software/modulefiles
module load nlpl-nltk

This installation (just as other NLPL-maintained Python virtual environments) can be combined with other Python-based modules, for example the NLPL installations of PyTorch or TensorFlow. To ‘stack’ multiple Python environments, they can simply be loaded together, e.g.

module load nlpl-nltk nlpl-tensorflow

Because PyTorch and TensorFlow are ‘special’ in their requirements for dynamic libraries and support for both cpu and gpu nodes, it is important for them to be activated last, i.e. on the ‘top’ of a multi-module stack.

Versions

As of October 2018, version 2.0.12 is installed.

Installation

After a ‘standard’ virtual environment and module definition have been created:

module load nlpl-spacy
pip install spacy
for i in en de es pt fr it nl xx; do python -m spacy download $i; done