我正在做一个数据交易数据框架的例子。这样的数据库包含客户ID,交易总值(GMV)和收入。以DF为例:
num_variables = 100
rng = np.random.default_rng()
df = pd.DataFrame({
'id' : np.random.randint(1,999999999,num_variables),
'date' : [np.random.choice(pd.date_range(datetime(2022,6,1),datetime(2022,12,31))) for i in range(num_variables)],
'gmv' : rng.random(num_variables) * 100,
'revenue' : rng.random(num_variables) * 100})
我按客户ID对这些数据进行分组,与交易月份交叉,并显示收入值。
clients = df[['id', 'date','revenue']].groupby(['id', df.date.dt.to_period("M")], dropna=False).aggregate({'revenue': 'sum'})
clients.reset_index(inplace=True)
现在创建一个交叉表
CrossTab = pd.crosstab(clients['id'], clients['date'], values=clients['revenue'], rownames=None, colnames=None, aggfunc='sum', margins=True, margins_name='All', dropna=False, normalize=False)
上面的代码工作正常,因为我的样本 Dataframe 收入是一个“float64”的dtype。但它改变了dtype为Float64,它不再工作。
num_variables = 100
rng = np.random.default_rng()
df = pd.DataFrame({
'id' : np.random.randint(1,999999999,num_variables),
'date' : [np.random.choice(pd.date_range(datetime(2022,6,1),datetime(2022,12,31))) for i in range(num_variables)],
'gmv' : rng.random(num_variables) * 100,
'revenue' : rng.random(num_variables) * 100})
df = df.astype({'revenue':'Float64'})
clients = df[['id', 'date','revenue']].groupby(['id', df.date.dt.to_period("M")], dropna=False).aggregate({'revenue': 'sum'})
clients.reset_index(inplace=True)
CrossTab = pd.crosstab(clients['id'], clients['date'], values=clients['revenue'], rownames=None, colnames=None, aggfunc='sum', margins=True, margins_name='All', dropna=False, normalize=False)
产出
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[31], line 1
----> 1 CrossTab = pd.crosstab(clients['id'], clients['date'], values=clients['revenue'], rownames=None, colnames=None, aggfunc='sum', margins=True, margins_name='All', dropna=False, normalize=False)
File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\reshape\pivot.py:691, in crosstab(index, columns, values, rownames, colnames, aggfunc, margins, margins_name, dropna, normalize)
688 df["__dummy__"] = values
689 kwargs = {"aggfunc": aggfunc}
--> 691 table = df.pivot_table(
692 "__dummy__",
693 index=unique_rownames,
694 columns=unique_colnames,
695 margins=margins,
696 margins_name=margins_name,
697 dropna=dropna,
698 **kwargs,
699 )
701 # Post-process
702 if normalize is not False:
File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\frame.py:8728, in DataFrame.pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed, sort)
8711 @Substitution("")
8712 @Appender(_shared_docs["pivot_table"])
8713 def pivot_table(
(...)
...
--> 292 raise TypeError(dtype) # pragma: no cover
294 converted = maybe_downcast_numeric(result, dtype, do_round)
295 if converted is not result:
TypeError: Float64
1条答案
按热度按时间1szpjjfi1#
我已经在Pandasgithub上报告了这个问题,目前还在分析中。
https://github.com/pandas-dev/pandas/issues/50313