pandas 对索引列组合上的 Dataframe 求和

pkln4tw6  于 2023-01-15  发布在  其他
关注(0)|答案(3)|浏览(158)

我有一个包含不同部分的 Dataframe (这里只有2个部分和速度,但一个电路最多可以包含8个部分和6个测量速度),如下所示:
| 截面|速率|数据1|数据2|
| - ------|- ------|- ------|- ------|
| A类|十个|1.5岁|二、五|
| A类|二十个|1.0分|2.0版|
| B|十个|二、五|三、五|
| B|二十个|2.0版|3.0版|
我想对所有可能电路的数据列求和
| A类|B|数据1|数据2|
| - ------|- ------|- ------|- ------|
| 十个|十个|4.0版|6.0分|
| 十个|二十个|三、五|五、五|
| 二十个|十个|三、五|五、五|
| 二十个|二十个|3.0版|5.0版|
我该怎么做呢?我可以进行组合,但不确定如何对它们的数据列求和。

5anewei6

5anewei61#

使用itertools.product,然后对每组求和:

from itertools import product

df2 = df.set_index(['section', 'speed']).T

out = (pd.concat({k: df2[list(k)].sum(1)
                  for k in product(*(d for _,d in df2.groupby(axis=1, level=0)))})
         .unstack(level=-1)
      )

输出:

Data1  Data2
(A, 10) (B, 10)    4.0    6.0
        (B, 20)    3.5    5.5
(A, 20) (B, 10)    3.5    5.5
        (B, 20)    3.0    5.0

对于提供的确切格式:

df2 = df.set_index(['section', 'speed']).T

sections = df2.columns.get_level_values('section').unique()

out = (pd.concat({tuple(x[1] for x in k):
                  df2[list(k)].sum(1)
                  for k in product(*(d for _,d in df2.groupby(axis=1, level=0)))
                 })
         .unstack(level=-1)
         .rename_axis(sections).reset_index()
      )

输出:

A   B  Data1  Data2
0  10  10    4.0    6.0
1  10  20    3.5    5.5
2  20  10    3.5    5.5
3  20  20    3.0    5.0
whlutmcx

whlutmcx2#

一种方法是:

from itertools import product

groups = [[row for i, row in v.iterrows()] for _, v in df.groupby("section")]
rows = []
for p in product(*groups):
    row = {}
    for e in p:
        d = e.to_dict()
        row[d.pop("section")] = d.pop("speed")
        for k, v in d.items():
            row[k] = row.get(k, 0) + v
    rows.append(row)

res = pd.DataFrame(rows)
print(res)
    • 产出**
A  Data1  Data2   B
0  10    4.0    6.0  10
1  10    3.5    5.5  20
2  20    3.5    5.5  10
3  20    3.0    5.0  20

或者更像Python:

def build_row(prod):
    row = {}
    for e in prod:
        d = e.to_dict()
        row[d.pop("section")] = d.pop("speed")
        for k, v in d.items():
            row[k] = row.get(k, 0) + v
    return row

groups = [[row for i, row in v.iterrows()] for _, v in df.groupby("section")]
res = pd.DataFrame([build_row(p) for p in product(*groups)])
print(res)

请注意,如果您想要精确的输出,只需对列重新排序即可。

erhoui1w

erhoui1w3#

使用pandasql软件包

df1.sql("""
    select a,b,tb1.data1+tb2.data1 as Data1,tb1.data2+tb2.data2 as Data2
        from
        (select speed as A,data1,data2 from self where section = 'A') tb1
        join
        (select speed as B,data1,data2 from self where section = 'B') tb2
""")

输出:

A   B  Data1  Data2
0  10  10    4.0    6.0
1  10  20    3.5    5.5
2  20  10    3.5    5.5
3  20  20    3.0    5.0

相关问题