pandas 在两个panda Dataframe 之间分配值

yeotifhr  于 2023-02-02  发布在  其他
关注(0)|答案(3)|浏览(154)

考虑两个 Dataframe :

>> import pandas as pd
>> df1 = pd.DataFrame({"category": ["foo", "foo", "bar", "bar", "bar"], "quantity": [1,2,1,2,3]})
>> print(df1)

    category    quantity
0   foo         1
1   foo         2
2   bar         1
3   bar         2
4   bar         3

>> df2 = pd.DataFrame({
            "category": ["foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar", "bar", "bar"], 
            "item": ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"]
        })
>> print(df2)
      category item
0      foo      A
1      foo      B
2      foo      C
3      foo      D
4      bar      E
5      bar      F
6      bar      G
7      bar      H
8      bar      I
9      bar      J

如何在df1中创建一个新列(新 Dataframe 称为df3),它连接df1category列并分配df2中的item列。

>> df3 = pd.DataFrame({
           "category": ["foo", "foo", "bar", "bar", "bar"], 
           "quantity": [1,2,1,2,3],
           "item": ["A", "B,C", "E", "F,G", "H,I,J"] 
})

     category  quantity   item
0      foo         1      A
1      foo         2      B,C
2      bar         1      E
3      bar         2      F,G
4      bar         3      H,I,J
7bsow1i6

7bsow1i61#

您可以通过quantityIndex.repeatDataFrame.loc重复行来创建帮助器DataFrame,将索引转换为列以避免丢失indices,并在两个DataFrame中创建帮助器列g,以便通过复制categoriesGroupBy.cumcount合并,然后使用DataFrame.merge与聚合join

df11 = (df1.loc[df1.index.repeat(df1['quantity'])].reset_index()
           .assign(g = lambda x: x.groupby('category').cumcount()))

df22 = df2.assign(g = df2.groupby('category').cumcount())

df = (df11.merge(df22, on=['g','category'], how='left')
          .groupby(['index','category','quantity'])['item']
          .agg(lambda x: ','.join(x.dropna()))
          .droplevel(0)
          .reset_index())
print (df)
  category  quantity   item
0      foo         1      A
1      foo         2    B,C
2      bar         1      E
3      bar         2    F,G
4      bar         3  H,I,J
vwoqyblh

vwoqyblh2#

您可以将迭代器用于itertools.islice

from itertools import islice

# aggregate the items as iterator
s = df2.groupby('category')['item'].agg(iter)

# for each category, allocate as many items as needed and join
df1['item'] = (df1.groupby('category', group_keys=False)['quantity']
                  .apply(lambda g:
                         g.map(lambda x: ','.join(list(islice(s[g.name], x)))))
               )

输出:

category  quantity   item
0      foo         1      A
1      foo         2    B,C
2      bar         1      E
3      bar         2    F,G
4      bar         3  H,I,J

请注意,如果您没有足够的项目,这将只使用可用的项目。
使用在F之后截断的df2作为输入的示例:

category  quantity item
0      foo         1    A
1      foo         2  B,C
2      bar         1    E
3      bar         2    F
4      bar         3
cxfofazt

cxfofazt3#

def function1(dd:pd.DataFrame):
    col2=dd.quantity.cumsum()
    col1=col2.shift(fill_value=0)
    return dd.assign(col1=col1,col2=col2).apply(lambda ss:",".join(
        df2.loc[df2.category==ss.category,"item"].iloc[ss.col1:ss.col2].tolist()
    ),axis=1)

df1.assign(item=df1.groupby('category').apply(function1).droplevel(0))

出局

category  quantity   item
0      foo         1      A
1      foo         2    B,C
2      bar         1      E
3      bar         2    F,G
4      bar         3  H,I,J

相关问题