下面是一些代码:
trained_nlp = spacy.load("models/output/model-best")
with open('Dog_Breed.csv') as f:
s = f.read() # + '\n' add trailing new line character
print('Here is the csv as string')
print(s)
doc = trained_nlp(s)
for ent in doc.ents:
if ent.label_ is not None and ent.label_=='BREED' :
sentence_breed = ' is a '
print(ent.text + sentence_breed + ent.label_ + ' and ')
if ent.text is not None and ent.label_ == 'ORIGIN' :
sentence_origin = ' has an '
print ( sentence_origin + ent.label_ + ' of ' + ent.text)
这是打印出来的东西
Here is the csv as string
,Dog Breed,Origin
0,Boykin Spaniel,South Carolina
1,German Shepherd,Germany
2,Border Collie,Scotland
has an ORIGIN of ,Dog
Breed, is a BREED and
has an ORIGIN of Origin
0,Boykin Spaniel is a BREED and
has an ORIGIN of ,South
1,German Shepherd is a BREED and
has an ORIGIN of ,
has an ORIGIN of Germany
2,Border Collie is a BREED and
has an ORIGIN of ,
has an ORIGIN of Scotland
一些异常:
- spacy模型标签似乎总是存在-但我想忽略第一个链接,“,狗品种,起源”,因为没有匹配。你可以看到我一直在使用None关键字来解决这个问题
1.请注意,“南卡罗来纳州”被截断,但“博伊金犬”没有?!?尽管如此,第一个输出是格式良好的-我得到了一个我想要的句子,比如“X是一个品种,并且具有Y的起源”,没有重复。
1.但是,对于第1行和第2行,我得到了“has an ORIGIN of,”的哑行。
我更擅长理解语言模型,而不是Python中棘手的内部循环;- )
我还以为是这样的:
0,Boykin Spaniel is a BREED and has an ORIGIN of South Carolina
1,German Shepherd is a BREED and has an ORIGIN of Germany
2,Border Collie is a BREED and has an ORIGIN of Scotland
1条答案
按热度按时间qyyhg6bp1#
大多数问题是由于不理解
trained_nlp()
如何解析csv字符串。很明显,它解析csv str的方式与您预期的不同。您应该调试
trained_nlp()
以了解它是如何准确地解析csv字符串的。我期待着这样的东西:
print("message")
默认通过一个默认参数添加一个换行符,即默认情况下它是print("message", end="\n")
。要改变这一点:
print("message", end='')