pandas 根据条件将新数据从另一个数据框添加到数据框

relj7zay  于 2023-01-28  发布在  其他
关注(0)|答案(5)|浏览(155)

我的问题是如何根据另一个数据框的条件将新列中的数据添加到数据框中,这有点难以表述,所以我在这里给出了一个示例

df1

columns  a   b  c
         0   10  1
         10  15  3
         15  20  5

df2
columns  d      e  
         3.3   10   
         5.5   20
         14.5  11
         17.2  5

这里我想做的是在df 2中添加另一列f,它的值来自df 1,这样如果d[i]在a[j]和b[j]之间,则将值c[j]复制到df 2中的新列f[i]。例如:d[1] = 5.5,因此0〈5.5〈10,因此,f[1] = c[0] = 1的值
最终结果应该如下所示

df2
columns  d      e    f
         3.3   10    1 
         5.5   20    1
         14.5  11    3
         17.2  5     5

任何帮助是非常感谢!
此致,
史蒂夫

wtlkbnrh

wtlkbnrh1#

假设df1 a和b中的间隔不重叠,可以将pd.cutpd.IntervalIndex一起使用:

import pandas as pd

# Your dfs here
df1 = pd.read_clipboard()
df2 = pd.read_clipboard()

idx = pd.IntervalIndex.from_arrays(df1["a"], df1["b"])
mapping = df1["c"].set_axis(idx)

df2["f"] = pd.cut(df2["d"], idx).map(mapping)

DF2:

d   e  f
0   3.3  10  1
1   5.5  20  1
2  14.5  11  3
3  17.2   5  5
cgyqldqp

cgyqldqp2#

您可以用途:

result = []
for item in df2['d']:
    for row in df1.iterrows():
        if row[1]['a'] <= item <= row[1]['b']:
            val = (row[1]['c'])
            break
        else:
            val = None
    result.append(val)
            
df2['f'] = result

print(df2)
bwntbbo3

bwntbbo33#

import pandas as pd
df1 = pd.DataFrame({'a':[0,10,15],'b':[10,15,20],'c':[1,3,5]})
df2 = pd.DataFrame({'d':[3.3,5.5,9.5,17.2],'e':[10,20,11,5]})
df2['f']=0
for i in range(df2.shape[0]):
    for j in range(df1.shape[0]):
        if df2.d[i]>=df1.a[j] and df2.d[i]<=df1.b[j]:
            df2.f[i]=df1.c[j]
df2
kgsdhlau

kgsdhlau4#

这个选项怎么样?

# merge the two dfs
df = pd.merge(df2, df1, left_on='d', right_on='b', how='left')
df2['f'] = None
df2['f'] = df.apply(lambda x: x['c'] if x['a_x'] <= x['d'] <= x['b_x'] else None, axis=1)
ecfdbz9o

ecfdbz9o5#

如果没有重叠的间隔,则pd.IntervalIndexaccepted solution非常合适。
另一个选项是使用conditional_join from pyjanitor,它也可以处理重叠的间隔:

# pip install pyjanitor
import pandas as pd
import janitor
(df2
.conditional_join(
    # types have to be same
    # for columns to be compared
    df1.astype({"a":float, "b":float}), 
    ('d', 'a', '>='), 
    ('d', 'b','<='), 
    # depending on the data size,
    # numba may offer more performance
    use_numba=False,
    right_columns = {'c':'f'})
)
      d   e  f
0   3.3  10  1
1   5.5  20  1
2  14.5  11  3
3  17.2   5  5

相关问题