在Python中,阅读和解析csv然后打印-为什么它的行为与我预期的不同?

ny6fqffe  于 2023-05-04  发布在  Python
关注(0)|答案(1)|浏览(130)

下面是一些代码:

trained_nlp = spacy.load("models/output/model-best")

with open('Dog_Breed.csv') as f:
    s = f.read() # + '\n'  add trailing new line character

print('Here is the csv as string')
print(s)

doc = trained_nlp(s)

for ent in doc.ents:
    if ent.label_ is not None and ent.label_=='BREED' :
        sentence_breed = ' is a '
        print(ent.text + sentence_breed  + ent.label_ + ' and ')
    if ent.text is not None and ent.label_ == 'ORIGIN' :
        sentence_origin = ' has an '
        print ( sentence_origin + ent.label_ + ' of ' + ent.text)

这是打印出来的东西

Here is the csv as string
,Dog Breed,Origin
0,Boykin Spaniel,South Carolina
1,German Shepherd,Germany
2,Border Collie,Scotland

 has an ORIGIN of ,Dog
Breed, is a BREED and 
 has an ORIGIN of Origin

0,Boykin Spaniel is a BREED and 
 has an ORIGIN of ,South
1,German Shepherd is a BREED and 
 has an ORIGIN of ,
 has an ORIGIN of Germany
2,Border Collie is a BREED and 
 has an ORIGIN of ,
 has an ORIGIN of Scotland

一些异常:

  1. spacy模型标签似乎总是存在-但我想忽略第一个链接,“,狗品种,起源”,因为没有匹配。你可以看到我一直在使用None关键字来解决这个问题
    1.请注意,“南卡罗来纳州”被截断,但“博伊金犬”没有?!?尽管如此,第一个输出是格式良好的-我得到了一个我想要的句子,比如“X是一个品种,并且具有Y的起源”,没有重复。
    1.但是,对于第1行和第2行,我得到了“has an ORIGIN of,”的哑行。
    我更擅长理解语言模型,而不是Python中棘手的内部循环;- )
    我还以为是这样的:
0,Boykin Spaniel is a BREED and has an ORIGIN of South Carolina
1,German Shepherd is a BREED and has an ORIGIN of Germany
2,Border Collie is a BREED and has an ORIGIN of Scotland
qyyhg6bp

qyyhg6bp1#

大多数问题是由于不理解trained_nlp()如何解析csv字符串。很明显,它解析csv str的方式与您预期的不同。
您应该调试trained_nlp()以了解它是如何准确地解析csv字符串的。
我期待着这样的东西:
print("message")默认通过一个默认参数添加一个换行符,即默认情况下它是print("message", end="\n")
要改变这一点:print("message", end='')

相关问题