在Pandas Dataframe 中查找没有发票编号的零件编号

aij0ehis  于 2023-03-11  发布在  其他
关注(0)|答案(2)|浏览(119)

我有两个Pandas Dataframe ,看起来像这样:
Dataframe 1:
| 顾客|销售额|发票号|产品子类别|零件号码|
| - ------|- ------|- ------|- ------|- ------|
| A类|约翰|一百二十三|亚单位A|第01页|
| B|约翰|四百五十六|次级B|第02页|
| C级|约翰|七八九|次级C|P03|
| D级|约翰|小行星|次级C|第四页|
| E级|约翰|小行星|亚单位A|第五页|
Dataframe 2:
| 产品子类别|零件号码|
| - ------|- ------|
| 亚单位A|第01页|
| 亚单位A|第五页|
| 次级B|第02页|
| 次级B|P06|
| 次级C|P03|
| 次级C|第四页|
我想检查每个customer都没有invoice nopart number
我试过这段代码,但它返回空 Dataframe :

# merge 2 dataframes
product = pd.merge(df1, df2, on = 'part number', how = 'outer')

# checking part number without invoice number
not_sold = product[product['invoice no'].isna()]

# grouping part number without invoice number per customer
not_sold_per_customer = not_sold.groupby('customer')['part'].agg(lambda x: ', '.join(x.drop_duplicates())).reset_index()

这是期望的输出:
| 顾客|零件号码|
| - ------|- ------|
| A类|第二页、第三页、第四页、第五页|
| B|第1页、第3页、第4页、第5页|
| C级|第1页、第2页、第4页、第5页|
| D级|第1页、第2页、第3页、第5页|
| E级|第一页、第二页、第三页、第四页|
请帮助我建立这正确的代码.谢谢你在前进!

4bbkushb

4bbkushb1#

用途:

#get common values between both DataFrames
df = pd.merge(df1, df2, on = 'part number')
#convert valeus to sets
s = set(df['part number'])

#join all sets without value from row
df1['part number'] = [', '.join(s - {x}) for x in df1['part number']]
print (df1)
  customer sales  invoice no product sub category         part number
0        A  John         123                 SubA  P05, P03, P02, P04
1        B  John         456                 SubB  P05, P03, P04, P01
2        C  John         789                 SubC  P05, P01, P02, P04
3        D  John        1011                 SubC  P05, P03, P02, P01
4        E  John        1213                 SubA  P03, P02, P04, P01
7vux5j2d

7vux5j2d2#

这可能是你想要的。如果客户/订单没有发票,它会创建两列数据框,分别是客户名称和零件号。

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'customer':['a', 'b', 'c', 'd', 'c'],'sale':['John', 'Dan', 'Will', 'Bill', 'John'],'invoices':[123, 234, np.nan, 789, np.nan], 
                    'product category': ['Suba', 'Subb', 'Subc', 'Subd', 'Subx'], 'part number':['Po1', 'Po2', 'Po3', 'Po4', 'Po5']})

df2 = pd.DataFrame({'product category': ['Suba', 'Subb', 'Subc', 'Subd', 'Subx'], 'part number':['Po1', 'Po2', 'Po3', 'Po4', 'Po5']})

df3 = df1.merge(df2, on = ['product category', 'part number'])
columns_to_drop = ['product category', 'sale', 'invoices']
df3 = df3.loc[df3.invoices.isna()]

df3.drop(columns_to_drop, axis=1, inplace=True)
df3 = df3.groupby('customer')['part number'].apply(list)
df3 = df3.to_frame()
df3['customer'] = df3.index
df3 = df3[['customer', 'part number']]
df3=df3.reset_index(drop=True)

def to_string(x):
    return ' '.join(str(e) for e in x)
df3['part number'] = df3['part number'].apply(to_string)

print(df3)

输出

customer part number
0        c     Po3 Po5

如果不使用apply方法,则零件号列将包含零件号列表而不是字符串。

相关问题