pandas 如何在多级列数据框中添加小计列?

yruzcnhs  于 2022-12-17  发布在  其他
关注(0)|答案(1)|浏览(179)

我有一个 Dataframe ,包含3个级别的多索引列:

quarter           Q1                        Q2                        Totals
year              2021        2022           2021         2022                      
                 qty orders  qty orders    qty orders   qty orders   qty orders
month name                                       
January          40  2        5   1         1   2         0 0             46  5
February         20  8        2   3         4   6         0 0             26  17
March            2  10        7   4         3   3         0 0             12  17
Totals           62 20       14   8         8   11        0 0             84  39

按级别(0,2)分组后,我得到了以下小计数据框:

quarter           Q1           Q2          Totals                     
                 qty orders  qty orders    qty orders  
month name                                       
January          45  3        1   2         46   5     
February         22  10       4   6         26   16     
March            9  14        3   3         12   17   
Totals           76 28        8   11        84   39

我需要将第二个插入到第一个中,而不打乱列、级别或索引,以便获得以下 Dataframe :

quarter       Q1                                   Q2                        Totals
year        2021        2022      Subtotal    2021        2022     Subtotal                 
            qty orders qty orders qty orders qty orders qty orders qty orders qty orders
month name                                       
January     40  2       5   1     45   3       1  2       0  0       1  2     46  5
February    20  8       2   3     22   10      4  6       0  0       4  6     26  16
March       2  10       7   4     9    14      3  3       0  0       3  3     12  17
Totals      62 20      14   8     76   28      8  11      0  0       8  11    84 39

我该怎么做呢?

nfs0ujit

nfs0ujit1#

使用初始 Dataframe (groupby之前):

import pandas as pd

df = pd.DataFrame(
    [
        [40, 2, 5, 1, 1, 2, 0, 0],
        [20, 8, 2, 3, 4, 6, 0, 0],
        [2, 10, 7, 4, 3, 3, 0, 0],
        [62, 20, 14, 8, 8, 11, 0, 0],
    ],
    columns=pd.MultiIndex.from_product(
        [("Q1", "Q2"), ("2021", "2022"), ("qty", "orders")]
    ),
    index=["January", "February", "March", "Totals"],
)

下面是一种实现方法(使用Python标准库itertools模块中的product,否则也可以使用嵌套的for循环):

# Add new columns
for level1, level2 in product(["Q1", "Q2"], ["qty", "orders"]):
    df.loc[:, (level1, "subtotal", level2)] = (
        df.loc[:, (level1, "2021", level2)] + df.loc[:, (level1, "2022", level2)]
    )

# Sort columns
df = df.reindex(
    pd.MultiIndex.from_product(
        [("Q1", "Q2"), ("2021", "2022", "subtotal"), ("qty", "orders")]
    ),
    axis=1,
)

然后:

print(df)
# Output
           Q1                                      Q2                     \
         2021        2022        subtotal        2021        2022
          qty orders  qty orders      qty orders  qty orders  qty orders   
January    40      2    5      1       45      3    1      2    0      0   
February   20      8    2      3       22     11    4      6    0      0   
March       2     10    7      4        9     14    3      3    0      0   
Totals     62     20   14      8       76     28    8     11    0      0   

         subtotal
              qty orders  
January         1      2  
February        4      6  
March           3      3  
Totals          8     11

相关问题