Difference between revisions of "Parsing/stanfordnlp"

From Nordic Language Processing Laboratory
Jump to: navigation, search
(Created page with "= Background = = Installation on Abel = <pre> module purge; module load python3/3.70 /projects/nlpl/operation/python/initialize --version 0.1.1 stanfordnlpl </pre> Next, ...")
 
(Background)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= Background =
 
= Background =
  
 +
The [https://stanfordnlp.github.io/stanfordnlp/ StanfordNLP] pipeline
 +
makes available two multilingual parsing systems with a Python interface,
 +
one implemented natively on top of PyTorch, the other a server–client
 +
interface to the venerable Stanford
 +
[https://stanfordnlp.github.io/CoreNLP/ CoreNLP] software (which runs
 +
in Java).
  
 +
<pre>
 +
module purge; module load nlpl-stanfordnlp
 +
python3 -c 'import stanfordnlp; \
 +
  foo = stanfordnlp.Pipeline(lang = "en")("Kim wanted to be heard."); \
 +
  foo.sentences[0].print_dependencies();'
 +
</pre>
 +
 +
It is possible to combine this module with the NLPL installation of
 +
CoreNLP and use its server interface to offload processing requests
 +
to CoreNLP.
 +
In a multi-user environment, it may in principle be necessary to
 +
pick a different (non-privileged) port for the server, as each port
 +
can only be used by one process at any point in time (see the
 +
initialization in <code>test.py</code>):
 +
<pre>
 +
module purge; module load nlpl-corenlp nlpl-stanfordnlp
 +
python3 /projects/nlpl/software/stanfordnlp/test.py
 +
</pre>
  
 
= Installation on Abel =
 
= Installation on Abel =

Latest revision as of 22:03, 11 May 2019

Background

The StanfordNLP pipeline makes available two multilingual parsing systems with a Python interface, one implemented natively on top of PyTorch, the other a server–client interface to the venerable Stanford CoreNLP software (which runs in Java).

module purge; module load nlpl-stanfordnlp
python3 -c 'import stanfordnlp; \
  foo = stanfordnlp.Pipeline(lang = "en")("Kim wanted to be heard."); \
  foo.sentences[0].print_dependencies();'

It is possible to combine this module with the NLPL installation of CoreNLP and use its server interface to offload processing requests to CoreNLP. In a multi-user environment, it may in principle be necessary to pick a different (non-privileged) port for the server, as each port can only be used by one process at any point in time (see the initialization in test.py):

module purge; module load nlpl-corenlp nlpl-stanfordnlp
python3 /projects/nlpl/software/stanfordnlp/test.py

Installation on Abel

module purge; module load python3/3.70
/projects/nlpl/operation/python/initialize --version 0.1.1 stanfordnlpl

Next, it appears we need to manually patch the default location for model files in .../site-packages/stanfordnlp/utils/resources.py, to point to a shared directory for all users (viz. /projects/nlpl/software/stanfordnlp/0.1.1/resources/). To download all available pre-trained models (for the complete UD 2.x set of treebanks):

module purge; module load nlpl-stanfordnlp
yes | python3 /projects/nlpl/software/stanfordnlp/0.1.1/download.py