Parsing/stanfordnlp
Background
The StanfordNLP pipeline makes available two multilingual parsing systems with a Python interface, one implemented natively on top of PyTorch, the other a server–client interface to the venerable Stanford CoreNLP software (which runs in Java).
module purge; module load nlpl-stanfordnlp python3 -c 'import stanfordnlp; \ foo = stanfordnlp.Pipeline("en")("Kim wanted to be heard."); \ foo.sentences[0].print_dependencies();'
Installation on Abel
module purge; module load python3/3.70 /projects/nlpl/operation/python/initialize --version 0.1.1 stanfordnlpl
Next, it appears we need to manually patch the default location
for model files in
.../site-packages/stanfordnlp/utils/resources.py
,
to point to a shared directory for all users (viz.
/projects/nlpl/software/stanfordnlp/0.1.1/resources/
).
To download all available pre-trained models (for the complete
UD 2.x set of treebanks):
module purge; module load nlpl-stanfordnlp yes | python3 /projects/nlpl/software/stanfordnlp/0.1.1/download.py