Pandas交叉表不支持浮点(大写F)数字格式

hc8w905p  于 2022-12-28  发布在  其他
关注(0)|答案(1)|浏览(86)

我正在做一个数据交易数据框架的例子。这样的数据库包含客户ID,交易总值(GMV)和收入。以DF为例:

num_variables = 100
rng = np.random.default_rng()
df = pd.DataFrame({
    'id' :  np.random.randint(1,999999999,num_variables),
    'date' : [np.random.choice(pd.date_range(datetime(2022,6,1),datetime(2022,12,31))) for i in range(num_variables)],
    'gmv' : rng.random(num_variables) * 100,
    'revenue' : rng.random(num_variables) * 100})

我按客户ID对这些数据进行分组,与交易月份交叉,并显示收入值。

clients = df[['id', 'date','revenue']].groupby(['id', df.date.dt.to_period("M")], dropna=False).aggregate({'revenue': 'sum'})
clients.reset_index(inplace=True)

现在创建一个交叉表

CrossTab = pd.crosstab(clients['id'], clients['date'], values=clients['revenue'], rownames=None, colnames=None, aggfunc='sum', margins=True, margins_name='All', dropna=False, normalize=False)

上面的代码工作正常,因为我的样本 Dataframe 收入是一个“float64”的dtype。但它改变了dtype为Float64,它不再工作。

num_variables = 100
rng = np.random.default_rng()
df = pd.DataFrame({
    'id' :  np.random.randint(1,999999999,num_variables),
    'date' : [np.random.choice(pd.date_range(datetime(2022,6,1),datetime(2022,12,31))) for i in range(num_variables)],
    'gmv' : rng.random(num_variables) * 100,
    'revenue' : rng.random(num_variables) * 100})
df = df.astype({'revenue':'Float64'})

clients = df[['id', 'date','revenue']].groupby(['id', df.date.dt.to_period("M")], dropna=False).aggregate({'revenue': 'sum'})
clients.reset_index(inplace=True)

CrossTab = pd.crosstab(clients['id'], clients['date'], values=clients['revenue'], rownames=None, colnames=None, aggfunc='sum', margins=True, margins_name='All', dropna=False, normalize=False)

产出

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[31], line 1
----> 1 CrossTab = pd.crosstab(clients['id'], clients['date'], values=clients['revenue'], rownames=None, colnames=None, aggfunc='sum', margins=True, margins_name='All', dropna=False, normalize=False)

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\reshape\pivot.py:691, in crosstab(index, columns, values, rownames, colnames, aggfunc, margins, margins_name, dropna, normalize)
    688     df["__dummy__"] = values
    689     kwargs = {"aggfunc": aggfunc}
--> 691 table = df.pivot_table(
    692     "__dummy__",
    693     index=unique_rownames,
    694     columns=unique_colnames,
    695     margins=margins,
    696     margins_name=margins_name,
    697     dropna=dropna,
    698     **kwargs,
    699 )
    701 # Post-process
    702 if normalize is not False:

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\frame.py:8728, in DataFrame.pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed, sort)
   8711 @Substitution("")
   8712 @Appender(_shared_docs["pivot_table"])
   8713 def pivot_table(
   (...)
...
--> 292     raise TypeError(dtype)  # pragma: no cover
    294 converted = maybe_downcast_numeric(result, dtype, do_round)
    295 if converted is not result:

TypeError: Float64
1szpjjfi

1szpjjfi1#

我已经在Pandasgithub上报告了这个问题,目前还在分析中。
https://github.com/pandas-dev/pandas/issues/50313

相关问题