java在apacheopennlp中创建训练模型

在我的txt文件中，我声明了一些单词，这些单词需要在文本中找到。看起来是这样的：

<START:anesthesia> Keine Anästhesie <END>

<START:anesthesia> Lokalanästhesie <END>

<START:anesthesia> Regional-Anästhesie <END>

<START:anesthesia> Allgemeine Anästhesie <END>

<START:anesthesia> Monitorized anesthetic care <END>

<START:releaseType> geheilt <END>

<START:releaseType> gebessert <END>

<START:releaseType> nicht gebessert <END>

<START:releaseType> uverändert <END>

<START:releaseType> verschlechtert <END>

<START:releaseType> nicht beurteilbar <END>

<START:releaseType> exiutus intraoperativ <END>

<START:releaseType> exitus postoperativ <END>

<START:releaseType> exitus ohne Zusammenhang mit OP <END>

<START:releaseType> Austritt <END>

<START:releaseType> Verlegung <END>

基于这个txt文件，我创建了一个bin模型文件：

// setting the parameters for training
    TrainingParameters params = new TrainingParameters();
    params.put(TrainingParameters.ITERATIONS_PARAM, 70);
    params.put(TrainingParameters.CUTOFF_PARAM, 1);

    // training the model using TokenNameFinderModel class
    TokenNameFinderModel entityModel = null;
    try {
        entityModel = NameFinderME.train("de", null, sampleStream, params,
                TokenNameFinderFactory.create(null, null, Collections.emptyMap(), new BioCodec()));
    } catch (IOException e) {
        e.printStackTrace();
    }

    // saving the model to "ner-custom-model.bin" file
    try {
        File output = new File("ner-custom-model.bin");
        FileOutputStream outputStream = new FileOutputStream(output);
        entityModel.serialize(outputStream);
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

之后，我对模型进行如下测试：

// testing the model and printing the types it found in the input sentence
    TokenNameFinder entityFinder = new NameFinderME(entityModel);

    String[] testSentence = { "Helge", "Schneider" , "bekommt" , "eine" , "allgemeine", "Anästesie"};

    System.out.println("Finding types in the test sentence..");
    Span[] entities = entityFinder.find(testSentence);
    System.out.println(entities.toString());

    for (Span entity : entities) {
        String foundEntity = "";
        for (int i = entity.getStart(); i < entity.getEnd(); i++) {
            foundEntity += testSentence[i] + " ";
        }
        System.out.println(entity.getType() + " : " + foundEntity + "\t [probability=" + entity.getProb() + "]");
    }

结果我得到了
麻醉：helge schneider[probability=0.17882..]释放类型：bekommt eine allgemeine anäS[概率=0.158…]
但在我的txt文件中，我没有声明“helge schneider”和“allgemeine an”这两个词的组合是麻醉ä“麻醉”是麻醉而不是释放类型。培养这样的模特需要看什么？训练参数有什么问题吗？我该怎么做才能正确显示？

java在apacheopennlp中创建训练模型

暂无答案！

相关问题

热门标签

最新问答