numpy 如何在pandas中将所有可能的配对追加为一列

kqqjbcuj  于 2023-03-30  发布在  其他
关注(0)|答案(2)|浏览(104)

我有一个dataframe如下:

Class  Value1   Value2
A      2        1
B      3        3
C      4        5

我希望生成所有可能的配对,输出 Dataframe 如下所示

Class   Value1   Value2   A_Value1   A_Value2   B_Value1   B_Value2   C_Value1  C_Value2
A       2        1        2          1          3           3         4         5   
B       3        3        2          1          3           3         4         5 
C       4        5        2          1          3           3         4         5

请假设有将近1000个这样的类。有没有什么有效的方法可以做到这一点?最终,我想找出每个对中(Value1和value2)之间的差异
编辑:A_B_Value是根据公式创建的

A_B_Value = absolute(ClassA_value1 - ClassB_value1) + absolute(ClassA_value2 - ClassB_value2)

    Class   Value1   Value2   A_B_Value   A_C_Value   B_C_Value
    A         2        1       3            6         3
    B         3        3       3            6         3
    C         4        5       3            6         3

谢谢你

bxjv4tth

bxjv4tth1#

您可以stack并展平MultiIndex,然后执行交叉merge

s = df.set_index('Class').stack()
s.index = s.index.map('_'.join)
out = df.merge(s.to_frame().T, how='cross')

输出:

Class  Value1  Value2  A_Value1  A_Value2  B_Value1  B_Value2  C_Value1  C_Value2
0     A       2       1         2         1         3         3         4         5
1     B       3       3         2         1         3         3         4         5
2     C       4       5         2         1         3         3         4         5
矢量数值计算
from itertools import combinations

tmp = df.set_index('Class')

cols = list(combinations(tmp.index,2))
idx1, idx2 = map(list, zip(*cols))

v1_1 = tmp.loc[idx1, 'Value1'].to_numpy()
v1_2 = tmp.loc[idx2, 'Value1'].to_numpy()
v2_1 = tmp.loc[idx1, 'Value2'].to_numpy()
v2_2 = tmp.loc[idx2, 'Value2'].to_numpy()

df[[f'{x1}_{x2}_Value' for x1, x2 in cols]
  ] = np.repeat((abs(v1_1-v1_2)+abs(v2_1-v2_2))[None], len(df), axis=0)

print(df)

输出:

Class  Value1  Value2  A_B_Value  A_C_Value  B_C_Value
0     A       2       1          3          6          3
1     B       3       3          3          6          3
2     C       4       5          3          6          3
vs3odd8k

vs3odd8k2#

如果需要减去列Value1,Value2并追加新列,则创建字典并通过DataFrame.assign添加它们:

d = dict(zip(df['Class'].add('_diff'), 
             df['Value1'].sub(df['Value2'])))
print (d)
{'A_diff': 1, 'B_diff': 0, 'C_diff': -1}

df = df.assign(**d)
print (df)
  Class  Value1  Value2  A_diff  B_diff  C_diff
0     A       2       1       1       0      -1
1     B       3       3       1       0      -1
2     C       4       5       1       0      -1

编辑:你可以通过itertools.combinations创建所有的组合,并在字典理解中获取差异,最后通过DataFrame.assign创建新的列:

from  itertools import combinations

df1 = df.set_index('Class')
cols = list(combinations(df1.index,2))

d = {f'{a}_{b}_Value' : abs(df1.loc[a, 'Value1'] - df1.loc[b, 'Value1']) + 
                        abs(df1.loc[a, 'Value2'] - df1.loc[b, 'Value2']) for a, b in cols}
df = df.assign(**d)
print (df)
  Class  Value1  Value2  A_B_Value  A_C_Value  B_C_Value
0     A       2       1          3          6          3
1     B       3       3          3          6          3
2     C       4       5          3          6          3

EDIT1:因为性能很重要,这里是受this启发的矢量化解决方案:

#convert Class to index
df1 = df.set_index('Class')

#convert DataFrame to 2d array
v = df1.to_numpy()
#get indices of combinations
i, j = np.tril_indices(len(df1.index), -1)

#select array - first column Value1 is 0
out1_1 = v[i, 0]
out1_2 = v[j, 0]

#select array - second column Value2 is 1
out2_1 = v[i, 1]
out2_2 = v[j, 1] 

#new columns names by combinations
cols = [f'{a}_{b}_Value' for a, b in zip(df1.index[j], df1.index[i])]

#new values in array
arr = np.abs(out1_1 - out1_2) + np.abs(out2_1 - out2_2)
 
#appended new columns
df = df.assign(**dict(zip(cols, arr)))
print (df)
  Class  Value1  Value2  A_B_Value  A_C_Value  B_C_Value
0     A       2       1          3          6          3
1     B       3       3          3          6          3
2     C       4       5          3          6          3

性能比较:
一个三个三个一个

相关问题