Python StanfordcorenlpServer客户端识别：::NER不区分大小写不起作用

eblbsuwk 于 6个月前发布在 Python

关注(0)|答案(6)|浏览(66)

根据你提供的代码，我发现你在初始化StanfordNLP类时没有设置annotators属性。你需要在__init__方法中添加以下代码：

self.props['annotators'] = 'tokenize, ssplit, pos, lemma, parse, sentiment'

修改后的代码如下：

from collections import defaultdict
from stanfordcorenlp import StanfordCoreNLP
import json

class StanfordNLP:
    def __init__(self, host='localhost', port=9000):
        self.nlp = StanfordCoreNLP(host, port=port, timeout=30000)
        self.props = {
            'annotators': 'tokenize, ssplit, pos, lemma, parse, sentiment',
            'pipelineLanguage': 'en',
            'truecase.overwriteText': 'true',
            'outputFormat': 'json'
        }

然后，你可以尝试使用以下输入文本：

text = 'rajesh lives in hyderbad'

输出应该是：

NER: [('rajesh', 'PERSON'), ('lives', 'O'), ('in', 'O'), ('Hyderabad', 'LOCATION')]

CoreNLP

来源：https://github.com/stanfordnlp/CoreNLP/issues/980

6条答案

按热度按时间

syqv5f0l1#

在基本层面上，您在使用truecase annotator之后使用其他所有内容，因此它不会影响先前annotators的结果。使用无词项模型会更容易....

周二，2020年1月7日晚上11:18 PM BVSREDDY82 ***@**.**> 写道： 使用以下命令启动服务器： java -mx4g -cp "" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000 Python客户端： ` from collections import defaultdict from stanfordcorenlp import StanfordCoreNLP import json class StanfordNLP: def init(self, host='http://localhost', port=9000): self.nlp = StanfordCoreNLP(host, port=port, timeout=30000) # , quiet=False, logging_level=logging.DEBUG) self.props = { 'annotators': 'tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref, relation, truecase', 'pipelineLanguage': 'en', 'truecase.overwriteText': 'true', 'outputFormat': 'json' }如果提供输入文本： text = 'rajesh lives in hyderbad' def ner(self, sentence): return self.nlp.ner(sentence) def annotate(self, sentence): return json.loads(self.nlp.annotate(sentence, properties=self.props)) @staticmethod def tokens_to_dict(_tokens): tokens = defaultdict(dict) for token in _tokens: tokens[int(token['index'])] = { 'ner': token['ner'] } return tokens if name == 'main': sNLP = StanfordNLP() text = 'Rajesh lives in Hyderabad' print ("NER:", sNLP.ner(text)) 预期输出：NER: [('Rajesh', 'PERSON'), ('lives', 'O'), ('in', 'O'), ('Hyderabad', 'LOCATION')]实际输出：NER: [('Rajesh', 'PERSON'), ('lives', 'O'), ('in', 'O'), ('Hyderabad', 'LOCATION')]如果我提供输入文本： text = 'rajesh lives in hyderbad'预期输出：NER: [('rajesh', 'PERSON'), ('lives', 'O'), ('in', 'O'), ('hyderabad', 'LOCATION')]实际输出：NER: [('rajesh', 'O'), ('lives', 'O'), ('in', 'O'), ('hyderabad', 'O')] 我希望通过使用true-case注解来解决这个问题但是我的尝试没有成功 — 您收到此消息是因为您订阅了此线程。直接回复此电子邮件，查看GitHub <#980?email_source=notifications&email_token=AA2AYWIPLVWRSHDMPTJ6ZATQ4V45PA5CNFSM4KED5I5KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IEVZETQ> 或者取消订阅 < https://github.com/notifications/unsubscribe-auth/AA2AYWNKJ2ENLOVLYO5P223Q4V45PANCNFSM4KED5I5A > 。

赞(0）回复(0）举报 6个月前

n9vozmp42#

你能提供那个的示例代码吗，还有通过Python客户端合并所有无大小写模型的方法吗？

赞(0）回复(0）举报 6个月前

tpgth1q73#

你应该能够为你的Python代码调整这些模型路径。

在2020年1月8日星期三上午12:15,BVSREDDY82 ***@***.***>写道：你能提供那个的示例代码吗？还有一种方法可以通过Python客户端组合所有无视大小写的模型——你收到这封邮件是因为你评论了。直接回复这封邮件，查看GitHub上的<#980?email_source=notifications&email_token=AA2AYWPG3MZWPMIS5VAGQKTQ4WDTPA5CNFSM4KED5I5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEILRPAA#issuecomment-571938688>,或者取消订阅< https://github.com/notifications/unsubscribe-auth/AA2AYWPGTLV5EATAZDYWEC3Q4WDTPANCNFSM4KED5I5A >。

赞(0）回复(0）举报 6个月前

yqyhoc1h4#

我希望通过使用truecase annotator来解决这个问题，如果可能的话，你能提供一个样本吗？

赞(0）回复(0）举报 6个月前

am46iovg5#

https://stanfordnlp.github.io/CoreNLP/caseless.html You should be able to adapt those model paths for your python code
…
On Wed, Jan 8, 2020 at 12:15 AM BVSREDDY82 ***@***.***> wrote: Can you provide the sample code for that and also way to combine all caseless models via python client — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#980?email_source=notifications&email_token=AA2AYWPG3MZWPMIS5VAGQKTQ4WDTPA5CNFSM4KED5I5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEILRPAA#issuecomment-571938688>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWPGTLV5EATAZDYWEC3Q4WDTPANCNFSM4KED5I5A .
I wanna combine multiple caseless models... Here im able to reference only single model

赞(0）回复(0）举报 6个月前

a11xaf1n6#

使用无大小写模型，它运行良好...然而，我仍然好奇如何通过真正的大小写标注器来解决这个问题，例如如果POS是NNP,将其转换为标题大小写并重新验证。

赞(0）回复(0）举报 6个月前