正如标题,我有一个csv文件与6列。为NLP处理我需要提取第6列(这是一个审查评论列),并转换成一个列表的单词使用NLP。下面的代码是由教师:
def read_twitter(fname):
""" Read the given dataset into list and clean stop words.
Args:
fname (string): filename of Twitter Dataset
Returns:
list of list of words: we view each document as a list, including a list of all words
"""
twitter = []
with open(fname,encoding="utf-8") as f:
for line in f:
tweet = f.readline().split(",")[5]
# YOUR CLEANING CODE HERE
# - Clean tweet
# - Split into list words
# - Store list in twitter
return twitter
然后我们调用函数read_twitter:
twitter = read_twitter('twitter.csv')
它应该根据需要返回一些列表的列表。但是,由于没有在上面的部分添加代码,我确信它应该返回一个空列表。但是它给出了以下错误:
索引错误跟踪(最近调用最后调用)
在read_twitter(名字)中的应用程序数据\本地\温度\内核_15784\2512851317.py
12 for line in f:
13
---〉14推特= f.读取行().分割(“,”)[5]
15
16
IndexError:列表索引超出范围。
但当我试图编辑上面的代码并将其更改为:
def read_twitter(fname):
""" Read the given dataset into list and clean stop words.
Args:
fname (string): filename of Twitter Dataset
Returns:
list of list of words: we view each document as a list, including a list of all words
"""
twitter = []
with open(fname,encoding="utf-8") as f:
for line in f:
print(f.readline().split(",")[5])
return twitter
twitter = read_twitter('twitter.csv')
它实际上有结果,但只包含数据集的一半行。我很困惑这个readline()函数在这里是怎么做的,为什么它一直说超出范围。任何帮助都将不胜感激。
1条答案
按热度按时间qq24tv8q1#
您正在通过合并文件迭代和readline来跳过行。
for line in f:
迭代一行,然后tweet = f.readline().split(",")[5]
读取下一行。只需删除readline。