readline只打印csv文件中一半的结果

q8l4jmvw  于 2023-02-27  发布在  其他
关注(0)|答案(1)|浏览(126)

正如标题,我有一个csv文件与6列。为NLP处理我需要提取第6列(这是一个审查评论列),并转换成一个列表的单词使用NLP。下面的代码是由教师:

def read_twitter(fname):
    """ Read the given dataset into list and clean stop words. 
    
    Args: 
        fname (string): filename of Twitter Dataset
        
    Returns:
        list of list of words: we view each document as a list, including a list of all words 
    """
    twitter = []
    with open(fname,encoding="utf-8") as f:
        for line in f:
            tweet = f.readline().split(",")[5]
            
            # YOUR CLEANING CODE HERE
            #    - Clean tweet
            #    - Split into list words
            #    - Store list in twitter
            
    return twitter

然后我们调用函数read_twitter:

twitter = read_twitter('twitter.csv')

它应该根据需要返回一些列表的列表。但是,由于没有在上面的部分添加代码,我确信它应该返回一个空列表。但是它给出了以下错误:
索引错误跟踪(最近调用最后调用)
在read_twitter(名字)中的应用程序数据\本地\温度\内核_15784\2512851317.py

12         for line in f:

 13

---〉14推特= f.读取行().分割(“,”)[5]

15 

 16

IndexError:列表索引超出范围。
但当我试图编辑上面的代码并将其更改为:

def read_twitter(fname):
    """ Read the given dataset into list and clean stop words. 
    
    Args: 
        fname (string): filename of Twitter Dataset
        
    Returns:
        list of list of words: we view each document as a list, including a list of all words 
    """
    twitter = []
    with open(fname,encoding="utf-8") as f:
        for line in f:
            print(f.readline().split(",")[5])
            
    return twitter
twitter = read_twitter('twitter.csv')

它实际上有结果,但只包含数据集的一半行。我很困惑这个readline()函数在这里是怎么做的,为什么它一直说超出范围。任何帮助都将不胜感激。

qq24tv8q

qq24tv8q1#

您正在通过合并文件迭代和readline来跳过行。for line in f:迭代一行,然后tweet = f.readline().split(",")[5]读取下一行。只需删除readline。

def read_twitter(fname):
    """ Read the given dataset into list and clean stop words. 
    
    Args: 
        fname (string): filename of Twitter Dataset
        
    Returns:
        list of list of words: we view each document as a list, including a list of all words 
    """
    twitter = []
    with open(fname,encoding="utf-8") as f:
        for line in f:
            tweet = line.split(",")[5]
            
            # YOUR CLEANING CODE HERE
            #    - Clean tweet
            #    - Split into list words
            #    - Store list in twitter
            
    return twitter

相关问题