numpy Dataframe Pandas:基于另一个 Dataframe 的列通过部分字符串匹配生成新列

wd2eg0qa  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(82)

我有两个 Dataframe
df1看起来像这样(相关列):

brand               product_line        size    profile
0   winter              hdw2                21.r    LINEN
1   stone               r294                31x5    WAXY
2   han                 th22                3t      NaN
3   winter              trjj                5t      LINEN
4   stone               ael2                2d      LINEN

我有另一个dataframe,它有许多列,下面是相关的列:

df2
NAME                PRODUCT_LINE        SIZE      MAKE 
0   winter+             hdw2                21.r      0.2
1   stoneas             r294                31x5      0.5
2   han                 th22                3t        3
3   winter              trjj                5t        1
4   stone               ael2                2d        34

现在基本上第一个框架中的列分别Map到NAMEPRODUCT_LINESIZE。两个帧中的列都是str.lower()str.replace(" ", "")
现在,我希望MAKE列在第一个嵌套框中,这样这个条件对所有行都应该为真。我可以一行一行地做,但要花很多时间。有没有一种方法可以在整个列上快速完成,而不需要在 Dataframe 中循环?我想要一个有效的解决方案。
下面是我如何在循环中做到这一点:

spec = df2[(df2['NAME'].str.lower().str.replace(' ', '').str.contains(row.brand.lower().replace(' ', '')))
& (df2['PRODUCT_NAME'].str.lower().str.replace(' ', '').str.contains(row.product_line.lower().replace(' ', ''))) 
& (df2['SIZE'].str.lower().str.replace(' ', '').str.contains(row.size.lower().replace(' ', '')))  ]

所以基本上,当上面的条件对一行为真时,它应该从df2中获取相应的MAKE并放入df1中。而且,只有当df1中的profile列不是NaN时,才会发生这种情况。

vlju58qv

vlju58qv1#

为了提高效率,我建议merging水平的两个嵌套框来处理一个嵌套框。
此外,由于您只希望在所有条件都满足的情况下检索MAKE值,如果其中一个条件不满足,那么比较观察的其他项就没有意义了。这可以保存你很多时间。因此,我引入了一个list_indices变量,它将比较限制为仅对这些值进行比较。
下面是完整的代码:

# imports
import pandas as pd

# Create two DataFrames, df1 and df2, with sample data
df1 = pd.DataFrame({
    'brand': ["winter", "stone", "han", "winter", "stone"],
    'product_line': ["hdw2", "r294", "th22", "trjj", "ael2"],
    'size': ["21.r", "31x5", "3t", "5t", "2d"],
    'profile': ["LINEN", "WAXY", "NaN", "LINEN", "LiNEN"]
})

df2 = pd.DataFrame({
    'NAME': ["Winter", "stoneas", "han", "winter", "stone"],
    'PRODUCT_LINE': ["hdw2", "r294", "th22", "trjj", "ael2"],
    'SIZE': ["21.r", "31x5", "3t", "5t", "2d"],
    'MAKE': [0.2, 0.5, 3, 1, 34]
})

def clean_columns(df, columns):
    df[columns] = df[columns].applymap(lambda x: str(x).lower().replace(" ", ""))

# List of columns to be cleaned
columns_to_clean_df1 = ['brand', 'product_line', 'size']
columns_to_clean_df2 = ['NAME', 'PRODUCT_LINE', 'SIZE']

# lowercase characters and remove spaces
clean_columns(df1, columns_to_clean_df1)
clean_columns(df2, columns_to_clean_df2)

# Merge the two DataFrames horizontally
df_merged = pd.concat([df1, df2], axis=1)

# Define a list of conditions as pairs of columns to compare
compare_cols = [('brand', 'NAME'), ('product_line', 'PRODUCT_LINE'), ('size', 'SIZE')]

# Initialize a list of row indices where 'profile' is not "NaN"
list_indices = df_merged[df_merged['profile'] != "NaN"].index.tolist()

# Create a new column 'conditions' and set it to False
df_merged['conditions'] = False

# Iterate through each pair of columns in 'conditions'
for col1, col2 in compare_cols:
    # Check if the values in the first column of the pair are present in the second column
    df_merged.loc[list_indices, 'conditions'] = df_merged.iloc[list_indices].apply(lambda row: row[col1] in row[col2], axis=1)

    # Update the list of indices to include only the rows where the condition is satisfied
    # as we only want to retrieve the MAKE column if all conditions have been met, 
    # if one of them hasn't, we stop comparing for this row
    list_indices = df_merged[df_merged['conditions']].index.tolist()
    
# Set the 'MAKE' column to None for rows where conditions are not satisfied
df_merged.loc[~df_merged.index.isin(list_indices), "MAKE"] = None

# Create a new DataFrame 'df_result' with selected columns from df1 and 'MAKE' from df2
df_result = df_merged[df1.columns.tolist() + ["MAKE"]]

结果如下:
| | 产品线|大小|轮廓|使| MAKE |
| --|--|--|--|--|--|
| 0 |冬季|HDW2| 21.r|亚麻|0.2|
| 1 |石|r294| 31x5|蜡质|0.5|
| 2 |韩|th22| 3t|楠|楠|
| 3 |冬季|特里季|5t|亚麻| 1 |
| 4 |石|AEL2| 2d|亚麻| 34 |
让我知道这是否符合你的要求!

相关问题