Much system development and tuning in NLP is driven by intrinsic evaluation, e.g. the venerable ParsEval measure of labeled bracketings in phrase structure trees, or the labeled attachment score (LAS) in syntactic dependency parsing. Such measures, however, do not directly inform (parser) developers about actual downstream utility, i.e. the effects of different system versions on an end-to-end system where parser outputs serve as an interim, internal representation.
This activity under the NLPL umbrella seeks to make extrinsic (parser) evaluation a ‘commodity’ tool, i.e. enable (parser) developers to measure the effects of different system versions on a comprehensive selection of relevant downstream applications. In 2017, this activity organized the Extrinsic Parser Evaluation 2017 (EPE) Shared Task, which assembled three state-of-the-art downstream systems and commonly used end-to-end benchmarks and attracted nine participating teams. The EPE infrastructure is now available on the Abel system, for all NLPL project members (and associates) to use. Please see the software documentation for instructions on how to evaluate parser outputs within the EPE framework.