Python Pandas dataframe - KeyError:'日期'

kadbb459  于 2023-05-27  发布在  Python
关注(0)|答案(2)|浏览(432)

我查过这个:KeyError: 'Date'和这个:Pandas DataFrame - KeyError: 'date'没有帮助。我收到KeyError:“日期”没有解释。
下面是我的代码:

import pandas as pd, numpy as np
import csv
import warnings
from bs4 import BeautifulSoup, MarkupResemblesLocatorWarning
from sklearn.impute import SimpleImputer
from sklearn.exceptions import ConvergenceWarning
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LinearRegression, LogisticRegression, Perceptron
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, confusion_matrix, ConfusionMatrixDisplay 
import seaborn as sns
import matplotlib.pyplot as plt

## Reading the data
dtypes = { 'Unnamed: 0': 'int32', 'drugName': 'category', 'condition': 'category', 'review': 'category', 'rating': 'float16', 'date': 'categorical', 'usefulCount': 'int16' }
train_df = pd.read_csv('/content/drugsComTrain_raw.tsv', sep='\t', quoting=2, dtype=dtypes)
# Randomly selecting 80% of the data from the training dataset
train_df = train_df.sample(frac=0.8, random_state=42)
test_df = pd.read_csv('/content/drugsComTest_raw.tsv', sep='\t', quoting=2, dtype=dtypes)

print(train_df.head())
## Converting date column to datetime format
train_df['date'], test_df['date'] = pd.to_datetime(train_df['date'], format='%b %d, %Y'), pd.to_datetime(test_df['date'], format='%b %d, %Y') #This is the line where Im getting the error.

最后一行是我得到错误的地方
错误:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3801             try:
-> 3802                 return self._engine.get_loc(casted_key)
   3803             except KeyError as err:

4 frames
/usr/local/lib/python3.10/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/usr/local/lib/python3.10/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'date'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-17-056c9fab2e6c> in <cell line: 24>()
     22 print(train_df.head())
     23 ## Converting date column to datetime format
---> 24 train_df['date'], test_df['date'] = pd.to_datetime(train_df['date'], format='%b %d, %Y'), pd.to_datetime(test_df['date'], format='%b %d, %Y')
     25 
     26 ## Extracting day, month, and year into separate columns

/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   3805             if self.columns.nlevels > 1:
   3806                 return self._getitem_multilevel(key)
-> 3807             indexer = self.columns.get_loc(key)
   3808             if is_integer(indexer):
   3809                 indexer = [indexer]

/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3802                 return self._engine.get_loc(casted_key)
   3803             except KeyError as err:
-> 3804                 raise KeyError(key) from err
   3805             except TypeError:
   3806                 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'date'
5kgi1eie

5kgi1eie1#

一切都很好地与您的dataset。也许你的文件被破坏了。直接用URL试试:

# Pandas 1.5.2
import pandas as pd

train_url = 'https://github.com/Rakesh9100/ML-Project-Drug-Review-Dataset/raw/main/datasets/drugsComTrain_raw.tsv'
test_url = 'https://github.com/Rakesh9100/ML-Project-Drug-Review-Dataset/raw/main/datasets/drugsComTest_raw.tsv'

dtypes = { 'Unnamed: 0': 'int32', 'drugName': 'category', 'condition': 'category', 'review': 'category', 'rating': 'float16', 'date': 'string', 'usefulCount': 'int16' }
train_df = pd.read_csv(train_url, sep='\t', quoting=2, dtype=dtypes, parse_dates=['date'])

train_df = train_df.sample(frac=0.8, random_state=42)
test_df = pd.read_csv(test_url, sep='\t', quoting=2, dtype=dtypes, parse_dates=['date'])

输出:

>>> train_df
        Unnamed: 0                           drugName           condition                                             review  rating       date  usefulCount
4792        127888                        Phentermine         Weight Loss  "I started taking Phentermine just a little ov...    10.0 2016-11-26           24
142824      197702                     Desvenlafaxine          Depression  "I have had depression for years due to situat...    10.0 2009-07-25           31
97316        40759                         Leuprolide       Endometriosis  "I was actually surprised to learn I had stage...     8.0 2011-12-21           31
21700       208098                            Zyclara           Keratosis  "Have used this for one week but began to have...     7.0 2013-01-20           17
72063       161657                    Diphenhydramine  Allergic Reactions  "Experienced an allergic reaction during dinne...    10.0 2015-04-11           20
...            ...                                ...                 ...                                                ...     ...        ...          ...
76399       124075                              Skyla       Birth Control  "I have had the Skyla for about a year now and...     9.0 2016-07-04            4
54080        38640                            Liletta       Birth Control  "I LOVE MIRENA.  My doctor (who must work for ...     1.0 2017-01-06            9
112498      103168                          Estarylla       Birth Control  "I&#039;ve been on it for 3 years with 0 side ...     9.0 2016-10-08            8
118396        7016      Aluminum chloride hexahydrate       Hyperhidrosis  "This was like a miracle prescription! Works g...    10.0 2015-02-09           13
73481        14912  Ethinyl estradiol / norethindrone       Birth Control  "I do not recommend this pill for anyone in Ju...     1.0 2017-09-14            1

[129038 rows x 7 columns]

>>> test_df
       Unnamed: 0         drugName                     condition                                             review  rating       date  usefulCount
0          163740      Mirtazapine                    Depression  "I&#039;ve tried a few antidepressants over th...    10.0 2012-02-28           22
1          206473       Mesalamine  Crohn's Disease, Maintenance  "My son has Crohn&#039;s disease and has done ...     8.0 2009-05-17           17
2          159672          Bactrim       Urinary Tract Infection                      "Quick reduction of symptoms"     9.0 2017-09-29            3
3           39293         Contrave                   Weight Loss  "Contrave combines drugs that were used for al...     9.0 2017-03-05           35
4           97768  Cyclafem 1 / 35                 Birth Control  "I have been on this birth control for one cyc...     9.0 2015-10-22            4
...           ...              ...                           ...                                                ...     ...        ...          ...
53761      159999        Tamoxifen     Breast Cancer, Prevention  "I have taken Tamoxifen for 5 years. Side effe...    10.0 2014-09-13           43
53762      140714     Escitalopram                       Anxiety  "I&#039;ve been taking Lexapro (escitaploprgra...     9.0 2016-10-08           11
53763      130945   Levonorgestrel                 Birth Control  "I&#039;m married, 34 years old and I have no ...     8.0 2010-11-15            7
53764       47656       Tapentadol                          Pain  "I was prescribed Nucynta for severe neck/shou...     1.0 2011-11-28           20
53765      113712        Arthrotec                      Sciatica                                      "It works!!!"     9.0 2009-09-13           46

[53766 rows x 7 columns]
czq61nw1

czq61nw12#

代码中的错误可能在于定义dtypes字典的方式。键“date”的值应为“object”,而不是“string”。

dtypes = {'Unnamed: 0': 'int32', 'drugName': 'category', 'condition': 'category', 'review': 'category', 'rating': 'float16', 'date': 'object', 'usefulCount': 'int16'}

相关问题