我有两个 Dataframe
df1看起来像这样（相关列）：

brand               product_line        size    profile
0   winter              hdw2                21.r    LINEN
1   stone               r294                31x5    WAXY
2   han                 th22                3t      NaN
3   winter              trjj                5t      LINEN
4   stone               ael2                2d      LINEN

我有另一个dataframe，它有许多列，下面是相关的列：

df2

NAME                PRODUCT_LINE        SIZE      MAKE 
0   winter+             hdw2                21.r      0.2
1   stoneas             r294                31x5      0.5
2   han                 th22                3t        3
3   winter              trjj                5t        1
4   stone               ael2                2d        34

现在基本上第一个框架中的列分别Map到NAME、PRODUCT_LINE和SIZE。两个帧中的列都是str.lower()和str.replace(" ", "")
现在，我希望MAKE列在第一个嵌套框中，这样这个条件对所有行都应该为真。我可以一行一行地做，但要花很多时间。有没有一种方法可以在整个列上快速完成，而不需要在 Dataframe 中循环？我想要一个有效的解决方案。
下面是我如何在循环中做到这一点：

spec = df2[(df2['NAME'].str.lower().str.replace(' ', '').str.contains(row.brand.lower().replace(' ', '')))
& (df2['PRODUCT_NAME'].str.lower().str.replace(' ', '').str.contains(row.product_line.lower().replace(' ', ''))) 
& (df2['SIZE'].str.lower().str.replace(' ', '').str.contains(row.size.lower().replace(' ', '')))  ]

所以基本上，当上面的条件对一行为真时，它应该从df2中获取相应的MAKE并放入df1中。而且，只有当df1中的profile列不是NaN时，才会发生这种情况。

# imports import pandas as pd # Create two DataFrames, df1 and df2, with sample data df1 = pd.DataFrame({ 'brand': ["winter", "stone", "han", "winter", "stone"], 'product_line': ["hdw2", "r294", "th22", "trjj", "ael2"], 'size': ["21.r", "31x5", "3t", "5t", "2d"], 'profile': ["LINEN", "WAXY", "NaN", "LINEN", "LiNEN"] }) df2 = pd.DataFrame({ 'NAME': ["Winter", "stoneas", "han", "winter", "stone"], 'PRODUCT_LINE': ["hdw2", "r294", "th22", "trjj", "ael2"], 'SIZE': ["21.r", "31x5", "3t", "5t", "2d"], 'MAKE': [0.2, 0.5, 3, 1, 34] }) def clean_columns(df, columns): df[columns] = df[columns].applymap(lambda x: str(x).lower().replace(" ", "")) # List of columns to be cleaned columns_to_clean_df1 = ['brand', 'product_line', 'size'] columns_to_clean_df2 = ['NAME', 'PRODUCT_LINE', 'SIZE'] # lowercase characters and remove spaces clean_columns(df1, columns_to_clean_df1) clean_columns(df2, columns_to_clean_df2) # Merge the two DataFrames horizontally df_merged = pd.concat([df1, df2], axis=1) # Define a list of conditions as pairs of columns to compare compare_cols = [('brand', 'NAME'), ('product_line', 'PRODUCT_LINE'), ('size', 'SIZE')] # Initialize a list of row indices where 'profile' is not "NaN" list_indices = df_merged[df_merged['profile'] != "NaN"].index.tolist() # Create a new column 'conditions' and set it to False df_merged['conditions'] = False # Iterate through each pair of columns in 'conditions' for col1, col2 in compare_cols: # Check if the values in the first column of the pair are present in the second column df_merged.loc[list_indices, 'conditions'] = df_merged.iloc[list_indices].apply(lambda row: row[col1] in row[col2], axis=1) # Update the list of indices to include only the rows where the condition is satisfied # as we only want to retrieve the MAKE column if all conditions have been met, # if one of them hasn't, we stop comparing for this row list_indices = df_merged[df_merged['conditions']].index.tolist() # Set the 'MAKE' column to None for rows where conditions are not satisfied df_merged.loc[~df_merged.index.isin(list_indices), "MAKE"] = None # Create a new DataFrame 'df_result' with selected columns from df1 and 'MAKE' from df2 df_result = df_merged[df1.columns.tolist() + ["MAKE"]]

1条答案

按热度按时间

vlju58qv1#

为了提高效率，我建议merging水平的两个嵌套框来处理一个嵌套框。
此外，由于您只希望在所有条件都满足的情况下检索MAKE值，如果其中一个条件不满足，那么比较观察的其他项就没有意义了。这可以保存你很多时间。因此，我引入了一个list_indices变量，它将比较限制为仅对这些值进行比较。
下面是完整的代码：

结果如下：
| | 产品线|大小|轮廓|使| MAKE |
| --|--|--|--|--|--|
| 0 |冬季|HDW2| 21.r|亚麻|0.2|
| 1 |石|r294| 31x5|蜡质|0.5|
| 2 |韩|th22| 3t|楠|楠|
| 3 |冬季|特里季|5t|亚麻| 1 |
| 4 |石|AEL2| 2d|亚麻| 34 |
让我知道这是否符合你的要求！

赞(0）回复(0）举报 12个月前

numpy Dataframe Pandas：基于另一个 Dataframe 的列通过部分字符串匹配生成新列

1条答案

相关问题

热门标签

最新问答