pandas 计算相对于所分配组的得分或其他

pu82cl6c  于 2023-01-04  发布在  其他
关注(0)|答案(2)|浏览(83)

我是python的初学者,我有两个 Dataframe ,如下所示。第一个 Dataframe 代表用户的向量和组号。

df1 = pd.DataFrame({'user': ['user 1', 'user 2', 'user 3', 'user 4', 'user 5'], 'x1': [[0.2, 0.3, 0.5],[0.3, 0.3, 0.4],[0.4, 0.4, 0.2],[0.2, 0.1, 0.7],[0.5,0.3,0.2]],'group': [1, 0, 0, 2, 1]})

df1

输出:

user   x1  group
    0   user 1  [0.2, 0.3, 0.5] 1
    1   user 2  [0.3, 0.3, 0.4] 0
    2   user 3  [0.4, 0.4, 0.2] 0
    3   user 4  [0.2, 0.1, 0.7] 2
    4   user 5  [0.5, 0.3, 0.2] 1

第二 Dataframe 表示具有其向量和变量(p2)及其阈值的组号

df2 = pd.DataFrame({'group': [0, 1, 2],
                   'x2': [[0.4, 0.2, 0.4],[0.5, 0.1, 0.4], [0.5, 0.1, 0.4]],
                   'p2': [0.231, 0.342, 0.411],
                   'threshold': [0.9, 0.6, 0.8]})
df2

输出:

group   x2  p2  threshold
0   0   [0.4, 0.2, 0.4] 0.231   0.9
1   1   [0.5, 0.1, 0.4] 0.342   0.6
2   2   [0.5, 0.1, 0.4] 0.411   0.8

我尝试使用以下公式计算每个用户相对于分配给它的组的得分(S):

其中k =组大小,T是(x2-x1)的传输矩阵。
我如何为所有用户做到这一点?

zazmityj

zazmityj1#

首先,将每个组的成员数加起来,得到 * k * 项:

df2['count'] = df1.groupby('group')['user'].count()

然后合并df1df2,这样我们就有了一个帧,其中每行包含每个用户的所有必要参数:

joined = df1.join(df2[['x2', 'p2', 'threshold', 'count']], on='group')
print(joined)

>>>      user               x1  group               x2     p2  threshold  count
0  user 1  [0.2, 0.3, 0.5]      1  [0.5, 0.1, 0.4]  0.342        0.6      2
1  user 2  [0.3, 0.3, 0.4]      0  [0.4, 0.2, 0.4]  0.231        0.9      2
2  user 3  [0.4, 0.4, 0.2]      0  [0.4, 0.2, 0.4]  0.231        0.9      2
3  user 4  [0.2, 0.1, 0.7]      2  [0.5, 0.1, 0.4]  0.411        0.8      1
4  user 5  [0.5, 0.3, 0.2]      1  [0.5, 0.1, 0.4]  0.342        0.6      2

现在定义计算S分数的函数:

def l_delta(z1, z2):
    return [a1 - a2 for (a1, a2) in zip(z1, z2)]

def inner(z1, z2):
    return sum([a1 * a2 for (a1, a2) in zip(z1, z2)])

def s_score(row):
    delta = l_delta(row['x2'], row['x1'])
    num = inner(delta, delta)
    return 1/row['count'] + num / row['p2']

最后,将这些函数应用于联接矩阵中的每一行:

joined['s_score'] = joined.apply(s_score, axis=1)
print(joined[['user', 's_score']])

结果:

user   s_score
0  user 1  0.909357
1  user 2  0.586580
2  user 3  0.846320
3  user 4  1.437956
4  user 5  0.733918
ruarlubt

ruarlubt2#

答案与@The Photon类似,其中我们(1)合并df 1和df 2,(2)用groupby计算k(3)计算(x2-x1)与自身的内积

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'user': ['user 1', 'user 2', 'user 3', 'user 4', 'user 5'],
                    'x1': [[0.2, 0.3, 0.5],[0.3, 0.3, 0.4],[0.4, 0.4, 0.2],[0.2, 0.1, 0.7],[0.5,0.3,0.2]],
                    'group': [1, 0, 0, 2, 1]})

df2 = pd.DataFrame({'group': [0, 1, 2],
                    'x2': [[0.4, 0.2, 0.4],[0.5, 0.1, 0.4], [0.5, 0.1, 0.4]],
                    'p2': [0.231, 0.342, 0.411],
                    'threshold': [0.9, 0.6, 0.8]})

#merge df1 and df2 into a single table
merged_df = df1.merge(df2)

#calculate the number of unique users per group (k)
merged_df['k'] = merged_df.groupby('group')['user'].transform('nunique')

#calculate x2-x1 for each user (convert to numpy array for vectorized subtraction)
x2_sub_x1 = merged_df['x2'].apply(np.array)-merged_df['x1'].apply(np.array)

#calculate (x2-x1)T(x2-x1) for each user (same as squaring each term and summing)
numerator = x2_sub_x1.pow(2).apply(sum)

#calculate S from your formula and add it as a column to the merged table
merged_df['S'] = (1/merged_df['k'])+(numerator/merged_df['p2'])

最终合并表

user    x1  group   x2  p2  threshold   k   S
0   user 1  [0.2, 0.3, 0.5] 1   [0.5, 0.1, 0.4] 0.342   0.6 2   0.909357
1   user 5  [0.5, 0.3, 0.2] 1   [0.5, 0.1, 0.4] 0.342   0.6 2   0.733918
2   user 2  [0.3, 0.3, 0.4] 0   [0.4, 0.2, 0.4] 0.231   0.9 2   0.586580
3   user 3  [0.4, 0.4, 0.2] 0   [0.4, 0.2, 0.4] 0.231   0.9 2   0.846320
4   user 4  [0.2, 0.1, 0.7] 2   [0.5, 0.1, 0.4] 0.411   0.8 1   1.437956

相关问题