Difference between revisions of "Infrastructure/software/spacy"
(Created page with "= Background = The [https://spacy.io/ SpaCy] library supports a range of ‘basic’ NLP tasks, including sentence splitting, tokenization, tagging and lemmatization, and dep...") |
(→Usage) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 16: | Line 16: | ||
<pre> | <pre> | ||
− | module load nlpl- | + | module use -a /proj*/nlpl/software/modulefiles |
+ | module load nlpl-spacy | ||
</pre> | </pre> | ||
Line 26: | Line 27: | ||
<pre> | <pre> | ||
− | module load nlpl- | + | module load nlpl-spacy nlpl-tensorflow |
</pre> | </pre> | ||
Line 33: | Line 34: | ||
for them to be activated last, i.e. on the ‘top’ of a multi-module stack. | for them to be activated last, i.e. on the ‘top’ of a multi-module stack. | ||
− | + | = Versions = | |
− | = | ||
As of October 2018, version 2.0.12 is installed. | As of October 2018, version 2.0.12 is installed. | ||
Line 40: | Line 40: | ||
= Installation = | = Installation = | ||
− | After a ‘standard’ virtual environment and module definition have | + | After a ‘standard’ NLPL virtual environment and module definition have |
been created: | been created: | ||
Latest revision as of 20:01, 25 February 2019
Contents
Background
The SpaCy library supports a range of ‘basic’ NLP tasks, including sentence splitting, tokenization, tagging and lemmatization, and dependency parsing—for about half a dozen European languages. SpaCy is somewhat similar on the surface to the Natural Language Toolkit (NLTK) but prides itself of both higher-quality analysis and better computational efficiency.
Usage
The module nlpl-nltk provides a SpaCy installation in a Python 3.5 virtual environment.
module use -a /proj*/nlpl/software/modulefiles module load nlpl-spacy
This installation (just as other NLPL-maintained Python virtual environments) can be combined with other Python-based modules, for example the NLPL installations of PyTorch or TensorFlow. To ‘stack’ multiple Python environments, they can simply be loaded together, e.g.
module load nlpl-spacy nlpl-tensorflow
Because PyTorch and TensorFlow are ‘special’ in their requirements for dynamic libraries and support for both cpu and gpu nodes, it is important for them to be activated last, i.e. on the ‘top’ of a multi-module stack.
Versions
As of October 2018, version 2.0.12 is installed.
Installation
After a ‘standard’ NLPL virtual environment and module definition have been created:
module load nlpl-spacy pip install spacy for i in en de es pt fr it nl xx; do python -m spacy download $i; done