使用pandas时出现Python KeyError

z31licg0  于 2023-06-04  发布在  Python
关注(0)|答案(2)|浏览(208)

我正在学习NLP教程,但在尝试将我的原始数据分组为好评和差评时遇到了一个关键错误。以下是教程链接:https://towardsdatascience.com/detecting-bad-customer-reviews-with-nlp-d8b36134dc7e

#reviews.csv
I am so angry about the service
Nothing was wrong, all good
The bedroom was dirty
The food was great

#nlp.py
import pandas as pd

#read data
reviews_df = pd.read_csv("reviews.csv")
# append the positive and negative text reviews
reviews_df["review"] = reviews_df["Negative_Review"] + 
reviews_df["Positive_Review"]

reviews_df.columns

我看到以下错误:

File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Negative_Review'

为什么会这样?

i34xakig

i34xakig1#

你得到这个错误,因为你不知道如何组织你的数据。
当你执行df['reviews']=df['Positive_reviews']+df['Negative_reviews ']时,你实际上是将正面评论与负面评论(目前不存在)的值相加到'reviews'列中(chich也不存在)。
你的csv只不过是一个每行只有一个文本的纯文本文件。此外,由于您正在处理文本,请记住将每个字符串都用引号(“)括起来,否则逗号将创建伪列。
使用您的方法,似乎您仍然需要手动标记所有评论(通常,如果您正在使用机器学习,您将在外部代码中执行此操作并将其加载到机器学习文件中)。
为了使代码正常工作,您需要执行以下操作:

import pandas as pd

df = pd.read_csv('TestFileFolder/57886076.csv', names=['text'])
## Fill with placeholder values
df['Positive_review']=0
df['Negative_review']=1
df.head()

结果:

text  Positive_review  Negative_review
0  I am so angry about the service                0                1
1      Nothing was wrong, all good                0                1
2            The bedroom was dirty                0                1
3               The food was great                0                1

然而,我建议你有一个单独的列(is_review_positive),并让它为true或false。您可以稍后轻松地对其进行编码。

t30tvxxf

t30tvxxf2#

###This code can run your pandas To a null value is found if null value is find it can break by own ###
import pandas as pd
path = pd.read_excel("C://Users//Public//Documents//python exe files//Excel files//AssetSubLocationMinor.XLSX")
print(path)
column_name = "MinorLocation_Name"  # Replace with the actual column name

for i in range(0,3):
    try:
        value = path.loc[i, column_name]
        # Code to handle the value if it exists
        if pd.isna(value) or value == "":
            print("null value")
        else:
            print("Ok")
    except KeyError:
        print("Column does not exist in the DataFrame")
        break

相关问题