如何在stanford依赖解析器中保留标点符号

mwkjh3gx 于 2021-07-03 发布在 Java

关注(0)|答案(1)|浏览(543)

我使用的是斯坦福corenlp（01.2016版本），我想保留依赖关系中的标点符号。当您从命令行运行它时，我已经找到了一些方法来做到这一点，但是我没有找到任何关于提取依赖关系的java代码的方法。
这是我现在的密码。它可以工作，但不包括标点符号：

Annotation document = new Annotation(text);

        Properties props = new Properties();

        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");

        props.setProperty("ssplit.newlineIsSentenceBreak", "always");

        props.setProperty("ssplit.eolonly", "true");

        props.setProperty("pos.model", modelPath1);

        props.put("parse.model", modelPath );

        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        pipeline.annotate(document);

        LexicalizedParser lp = LexicalizedParser.loadModel(modelPath + lexparserNameEn,

                "-maxLength", "200", "-retainTmpSubcategories");

        TreebankLanguagePack tlp = new PennTreebankLanguagePack();

        GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();

        List<CoreMap> sentences = document.get(SentencesAnnotation.class);

        for (CoreMap sentence : sentences) {

            List<CoreLabel> words = sentence.get(CoreAnnotations.TokensAnnotation.class);               

            Tree parse = lp.apply(words);

            GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
            Collection<TypedDependency> td = gs.typedDependencies();

            parsedText += td.toString() + "\n";

任何依赖关系对我来说都是可以的，基本的，键入的，折叠的，等等。我只想包括标点符号。
提前谢谢，

Java nlp stanford-nlp dependency-parsing

来源：https://stackoverflow.com/questions/37130722/how-to-keep-punctuation-in-stanford-dependency-parser

1条答案

按热度按时间

gmxoilav1#

您在这里做了相当多的额外工作，因为您通过corenlp运行解析器一次，然后通过调用 lp.apply(words) .
获取带有标点符号的依赖关系树/图的最简单方法是使用corenlp选项 parse.keepPunct 如下所示。

Annotation document = new Annotation(text);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
props.setProperty("ssplit.newlineIsSentenceBreak", "always");
props.setProperty("ssplit.eolonly", "true");
props.setProperty("pos.model", modelPath1);
props.setProperty("parse.model", modelPath);
props.setProperty("parse.keepPunct", "true");

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

pipeline.annotate(document);

for (CoreMap sentence : sentences) {
   //Pick whichever representation you want
   SemanticGraph basicDeps = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
   SemanticGraph collapsed = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
   SemanticGraph ccProcessed = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
}

句子注解对象将依赖树/图存储为 SemanticGraph . 如果你想要一份 TypedDependency 对象，使用 typedDependencies() . 例如，

List<TypedDependency> dependencies = basicDeps.typedDependencies();

赞(0）回复(0）举报 2021-07-03

我来回答

如何在stanford依赖解析器中保留标点符号

1条答案

相关问题

热门标签

最新问答