Parsing/stanfordnlp
Background
The StanfordNLP pipeline makes available two multilingual parsing systems with a Python interface, one implemented natively on top of PyTorch, the other a server–client interface to the venerable Stanford CoreNLP software (which runs in Java).
module purge; module load nlpl-stanfordnlp python3 -c 'import stanfordnlp; \ foo = stanfordnlp.Pipeline(lang = "en")("Kim wanted to be heard."); \ foo.sentences[0].print_dependencies();'
It is possible to combine this module with the NLPL installation of
CoreNLP and use its server interface to offload processing requests
to CoreNLP.
In a multi-user environment, it may in principle be necessary to
pick a different (non-privileged) port for the server, as each port
can only be used by one process at any point in time (see the
initialization in test.py
):
module purge; module load nlpl-corenlp nlpl-stanfordnlp python3 /projects/nlpl/software/stanfordnlp/test.py
Installation on Abel
module purge; module load python3/3.70 /projects/nlpl/operation/python/initialize --version 0.1.1 stanfordnlpl
Next, it appears we need to manually patch the default location
for model files in
.../site-packages/stanfordnlp/utils/resources.py
,
to point to a shared directory for all users (viz.
/projects/nlpl/software/stanfordnlp/0.1.1/resources/
).
To download all available pre-trained models (for the complete
UD 2.x set of treebanks):
module purge; module load nlpl-stanfordnlp yes | python3 /projects/nlpl/software/stanfordnlp/0.1.1/download.py