python-3.x 如何在多索引pivot pandas中删除无关索引

zqdjd7g9 于 2023-06-25 发布在 Python

关注(0)|答案(2)|浏览(107)

我有两个指数说：年和月。我用pivot来显示产品的数量。
现在，在我的df中，假设没有2020年7月和8月的记录。但透视图将显示这些月份和值0。我不希望透视表显示这些不相关的行（这些不存在于df中），因为它们会使我的最终透视表变得很长。如何减少这一点？
下面是我的例子df：

df = pd.DataFrame({'Product Type': ['Fruits', 'Fruits', 'Vegetable', 'Vegetable', 'Vegetable', 'Vegetable', 'Fruits', 'Fruits', 'Vegetables', 'Cars', 'Cars', 'Cars', 'Bikes', 'Bikes'],
                   'Product': ['Apple', 'Banana', 'Apple', 'Apple', 'Brocoli', 'Carrot', 'Apple', 'Banana', 'Brocoli', 'BMW M3', 'BMW M3', 'BMW M3', 'Hayabusa', 'Hayabusa'],
                   'Amount': [4938, 3285, 4947, 1516, 2212, 3778, 1110, 4436, 1049, 494, 2818, 3737, 954, 4074],
                  })

到目前为止的代码：

import pandas as pd
import numpy as np
df = pd.read_csv('try.csv')
bins = [0,1000,2000,5000,float(np.inf)]
labels = ['0-1000','1000-2000','2000-5000','5000+']
df['bins'] = pd.cut(df['Amount'],bins=bins, labels=labels, right=True)
pivot = df.pivot_table(index=['Product Type','Product'],columns='bins', aggfunc='size')
pivot.dropna(inplace=True)
pivot

预期输出：

Amount                 0-1000  1000-2000  2000-5000  5000+
Product Type Product                                      
Bikes        Hayabusa       1          0          1      0
Cars         BMW M3         1          0          2      0
Fruits       Apple          0          1          1      0
             Banana         0          0          2      0
Vegetable    Apple          0          1          1      0
             Brocoli        0          0          1      0
             Carrot         0          0          1      0
Vegetables   Brocoli        0          1          0      0

在df中，Bikes只包含'hayabusa'，我希望它在我的Pivot的Bike类别中。我该怎么做？

python-3.x

来源：https://stackoverflow.com/questions/76530445/how-to-drop-irrelevant-indexes-in-multi-index-pivot-pandas

2条答案

按热度按时间

sg24os4d1#

使用cut和crosstab：

bins = [0, 1000, 2000, 5000, np.inf]
labels = ['0-1000', '1000-2000', '2000-5000', '5000+']

out = pd.crosstab([df['Product Type'], df['Product']],
                  pd.cut(df['Amount'], bins=bins, labels=labels)
                 ).reindex(columns=labels, fill_value=0)

输出：

Amount                 0-1000  1000-2000  2000-5000  5000+
Product Type Product                                      
Bikes        Hayabusa       1          0          1      0
Cars         BMW M3         1          0          2      0
Fruits       Apple          0          1          1      0
             Banana         0          0          2      0
Vegetable    Apple          0          1          1      0
             Brocoli        0          0          1      0
             Carrot         0          0          1      0
Vegetables   Brocoli        0          1          0      0

使用的输入：

df = pd.DataFrame({'Product Type': ['Fruits', 'Fruits', 'Vegetable', 'Vegetable', 'Vegetable', 'Vegetable', 'Fruits', 'Fruits', 'Vegetables', 'Cars', 'Cars', 'Cars', 'Bikes', 'Bikes'],
                   'Product': ['Apple', 'Banana', 'Apple', 'Apple', 'Brocoli', 'Carrot', 'Apple', 'Banana', 'Brocoli', 'BMW M3', 'BMW M3', 'BMW M3', 'Hayabusa', 'Hayabusa'],
                   'Amount': [4938, 3285, 4947, 1516, 2212, 3778, 1110, 4436, 1049, 494, 2818, 3737, 954, 4074],
                  })

赞(0）回复(0）举报 2023-06-25

5fjcxozz2#

在不知道你的数据是什么样子的情况下，下面是我试图提供一个答案：
以下是一些示例数据：

details = {
    'year':[2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,
            2021,2021,2021,2021,2021,2021,2021,2021,2021,2021,2021,2021,],
    'month':[1,2,3,4,5,6,7,8,9,10,11,12,
             1,2,3,4,5,6,7,8,9,10,11,12,],
    'product_count':[102,67,36,23,7,6,np.nan,np.nan,5,3,3,2,
                     33,36,53,49,42,56,63,39,42,40,54,19,],
    'product_category':['Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars',
                        'Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars',]
                        }

df = pd.DataFrame(details)

我已将2020年7月和2020年8月的数据设置为缺失/NaN。
考虑到你使用一个带有年和月索引的枢轴，我假设如下所示：

(df
 .pivot(
    index=['year','month'], 
    columns=['product_category'], 
    values=['product_count'])
 .dropna()
)

请注意，在pivot的末尾链接“.dropna（）”如何将2020年7月和2020年8月从输出中排除。

赞(0）回复(0）举报 2023-06-25

我来回答

python-3.x 如何在多索引pivot pandas中删除无关索引

2条答案

相关问题

热门标签

最新问答