pandas KeyError:'date' -我不知道为什么我一直得到这个错误

nkcskrwz  于 2023-04-19  发布在  其他
关注(0)|答案(2)|浏览(349)

有人能帮我解决这个错误吗?我正在尝试绘制高级图表,以了解我正在使用的Covid-19数据集的各种发展情况。

# Convert date column to datetime type
df['date'] = pd.to_datetime(df.date)

# Group data by quarter and calculate total cases per million
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter

# Create bar plot for total cases per million on a quarterly basis
sns.set_style('whitegrid')
sns.barplot(x ='date', y = 'total_cases_per_million', data = df_total_cases_quarterly)
plt.title('Total Cases per Million (Worldwide) - Quarterly')
plt.xlabel('Quarter')
plt.ylabel('Total Cases per Million')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4'])  # Update x-axis labels to show quarters
plt.show()

运行上面给出的代码时的错误代码

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3628             try:
-> 3629                 return self._engine.get_loc(casted_key)
   3630             except KeyError as err:

/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'date'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/var/folders/rf/8yc43r0d13l2gw8m9r17pc7h0000gn/T/ipykernel_942/1110610808.py in <module>
      3 
      4 # Group data by quarter and calculate total cases per million
----> 5 df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter
      6 
      7 # Create bar plot for total cases per million on a quarterly basis

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3503             if self.columns.nlevels > 1:
   3504                 return self._getitem_multilevel(key)
-> 3505             indexer = self.columns.get_loc(key)
   3506             if is_integer(indexer):
   3507                 indexer = [indexer]

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3629                 return self._engine.get_loc(casted_key)
   3630             except KeyError as err:
-> 3631                 raise KeyError(key) from err
   3632             except TypeError:
   3633                 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'date'

我按季度对COVID-19数据集进行了分组,并计算了每百万例的总病例数,以便可以使用seaborn显示结果。尽管日期列没有丢失,但我在下面的代码中使用了它,并且成功运行,我不知道为什么我会得到上面给出的错误代码。我想知道是什么情况下的错误并修复它。感谢您的帮助!

此代码运行成功

# Filter data for daily new deaths per million
df_daily_new_deaths = df.groupby('date').agg({'new_deaths_per_million': 'sum'}).reset_index()

# Create line plot for daily new deaths per million
sns.set_style('whitegrid')
sns.lineplot(x = 'date', y = 'new_deaths_per_million', data = df_daily_new_deaths)
plt.title('Daily New Deaths per Million (Worldwide)')
plt.xlabel('Date')
plt.ylabel('Daily New Deaths per Million')
plt.xticks(rotation = 45)
plt.show()
hmae6n7t

hmae6n7t1#

你的代码应该可以正常工作,访问dt.quarter不应该改变列名。你可能正在做一些你没有在这里报告的事情。或者可能使用了一个有bug的旧版本的pandas?
这在python 3.8 + pandas 1.5.2和python 3.11 + pandas 2.0.0上进行了测试。
示例:

import pandas as pd
import numpy as np

# set up dummy data
np.random.seed(0)
df = pd.DataFrame({'date': ['2023-01-01', '2023-04-01', '2023-07-01', '2023-10-01']*5,
                   'total_cases_per_million': np.random.random(20),
                  })
df['date'] = pd.to_datetime(df.date)

# running your exact code

import seaborn as sns

# Group data by quarter and calculate total cases per million
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter
df_total_cases_quarterly

sns.set_style('whitegrid')
sns.barplot(x ='date', y = 'total_cases_per_million', data = df_total_cases_quarterly)
plt.title('Total Cases per Million (Worldwide) - Quarterly')
plt.xlabel('Quarter')
plt.ylabel('Total Cases per Million')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4'])  # Update x-axis labels to show quarters
plt.show()

输出:

j8yoct9x

j8yoct9x2#

您可以按groupby之前的季度重新分配列日期:

df_total_cases_quarterly = (df.assign(date=df['date'].dt.quarter)
                              .groupby('date')
                              .agg({'total_cases_per_million': 'sum'})
                              .reset_index()

或者将索引名称更改为DataFrame.rename_axis

df_total_cases_quarterly = (df.groupby(df['date'].dt.quarter)
                              .agg({'total_cases_per_million': 'sum'})
                              .rename_axis('date')
                              .reset_index()

相关问题