将Pandas DataFrame行与两个阈值进行比较

chy5wohz 于 2023-02-11 发布在其他

关注(0)|答案(1)|浏览(147)

我有两个数据框如下所示。数据框在现实中比下面的样本大。

df1
     route_no    cost_h1    cost_h2    cost_h3   cost_h4   cost_h5   max   min   location
0    0010          20         22         21         23       26       26    20    NY
1    0011          30         25         23         31       33       33    23    CA
2    0012          67         68         68         69       65       69    67    GA
3    0013          34         33         31         30       35       35    31    MO
4    0014          44         42         40         39       50       50    39    WA

df2
    route_no    cost_h1    cost_h2    cost_h3   cost_h4   cost_h5    location 
0    0020          19         27         21         24       20         NY
1    0021          31         22         23         30       33         CA
2    0023          66         67         68         70       65         GA
3    0022          34         33         31         30       35         MO
4    0025          41         42         40         39       50         WA
5    0030          19         26         20         24       20         NY
6    0032          37         31         31         20       35         MO
7    0034          40         41         39         39       50         WA

我们的想法是将df2的每一行与df1中指定的相应最大值和最小值进行比较。要比较的阈值取决于地点列中的匹配。如果任何行值超出最小值和最大值定义的范围，则它们将被放入单独的数据框中。请注意，成本段的数量是不同的。

pandas

来源：https://stackoverflow.com/questions/75413206/comparing-pandas-dataframe-rows-against-two-threshold-values

1条答案

按热度按时间

fnx2tebb1#

溶液

# Merge the dataframes on location to append the min/max columns to df2
df3 = df2.merge(df1[['location', 'max', 'min']], on='location', how='left')

# select the cost like columns
cost = df3.filter(like='cost')

# Check whether the cost values satisfy the interval condition
mask = cost.ge(df3['min'], axis=0) & cost.le(df3['max'], axis=0)

# filter the rows where one or more values in row do not satisfy the condition
df4 = df2[~mask.all(axis=1)]

结果

print(df4)

  route_no  cost_h1  cost_h2  cost_h3  cost_h4  cost_h5 location
0     0020       19       27       21       24       20       NY
1     0021       31       22       23       30       33       CA
2     0023       66       67       68       70       65       GA
3     0022       34       33       31       30       35       MO
5     0030       19       26       20       24       20       NY
6     0032       37       31       31       20       35       MO

赞(0）回复(0）举报 2023-02-11

我来回答

将Pandas DataFrame行与两个阈值进行比较

1条答案

溶液

结果

相关问题

热门标签

最新问答