IllegalArgumentException：PTBLexer：构造函数中的无效选项键：asciiQuotes Stanford NLP

问题描述

我正在尝试用法语测试[Stanford POS tagger] [1] API的Hello单词（我在python中使用了相同的.jar，并且效果很好）。这是我的代码

public class TextPreprocessor {

    private static MaxentTagger tagger=new MaxentTagger("../stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger");

    public static void main(String[] args) {
        
        String taggedString = tagger.tagString("Salut à tous,je suis coincé");
        System.out.println(taggedString);
    }
}

但是我得到以下异常：

Loading POS tagger from C:/Users/_Nprime496_/Downloads/Compressed/stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger ... done [0.3 sec].
Exception in thread "main" java.lang.IllegalArgumentException: PTBLexer: Invalid options key in constructor: asciiQuotes
    at edu.stanford.nlp.process.PTBLexer.<init>(PTBLexer.java)
    at edu.stanford.nlp.process.PTBTokenizer.<init>(PTBTokenizer.java:285)
    at edu.stanford.nlp.process.PTBTokenizer$PTBTokenizerFactory.getTokenizer(PTBTokenizer.java:698)
    at edu.stanford.nlp.process.DocumentPreprocessor$PlainTextIterator.<init>(DocumentPreprocessor.java:271)
    at edu.stanford.nlp.process.DocumentPreprocessor.iterator(DocumentPreprocessor.java:226)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.tokenizeText(MaxentTagger.java:1148)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger$TaggerWrapper.apply(MaxentTagger.java:1332)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagString(MaxentTagger.java:999)
    at modules.generation.preprocessing.TextPreprocessor.main(TextPreprocessor.java:19)

你能帮我吗？ [1]：https://nlp.stanford.edu/software/tagger.shtml

解决方法

您可以使用以下代码和完整的CoreNLP软件包：

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;

import java.util.*;


public class PipelineExample {

  public static String text = "Paris est la capitale de la France.";

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = StringUtils.argsToProperties("-props","french");
    // set the list of annotators to run
    props.setProperty("annotators","tokenize,ssplit,mwt,pos");
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument document = pipeline.processToCoreDocument(text);
    // display tokens
    for (CoreLabel tok : document.tokens()) {
      System.out.println(String.format("%s\t%s",tok.word(),tok.tag()));
    }
  }

}

您可以在此处下载CoreNLP：https://stanfordnlp.github.io/CoreNLP/

请确保下载最新的法语模型。

我不确定为什么使用独立标记器的示例不起作用。你在用什么罐子？

french java java pos-tagger stanford-nlp