Python/Numpy基于索引获取数组的平均值

zlwx9yxi 于 2022-12-26 发布在 Python

关注(0)|答案(3)|浏览(195)

我有两个numpy数组，第一个是values，第二个是indexes，我想做的是基于indexes数组得到values数组的平均值。
例如：

values = [1,2,3,4,5]
indexes = [0,0,1,1,2]
get_indexed_avg(values, indexes)
# should give me 
#   [1.5,    3.5,    5]

这里，indexes数组中的值表示最终数组中的索引。

values数组中的前两项将被平均，以形成最终数组中的零索引。
values数组中的第3项和第4项将被平均，以形成最终数组中的第一个索引。
1.最后，最后一项用于最终数组中的第二个索引。
我确实有一个python解决方案。但是那太可怕了，而且非常慢。有没有更好的解决方案？也许使用numpy？或者其他这样的库。

numpy

来源：https://stackoverflow.com/questions/71329884/python-numpy-get-average-of-array-based-on-index

3条答案

按热度按时间

6g8kf2rb1#

import pandas as pd
pd.Series(values).groupby(indexes).mean()
# OR
# pd.Series(values).groupby(indexes).mean().to_list()
# 0    1.5
# 1    3.5
# 2    5.0
# dtype: float64

赞(0）回复(0）举报 2022-12-26

erhoui1w2#

我想避开Pandas，所以我花了不少时间来解决这个问题，方法是使用one-hot encoding。
创建一个索引的one-hot编码会给予我们一个二维数组，在我们想要的地方有1。

indexes = np.array([0,0,1,1,2])
# one_hot = array(
#    [[1., 0., 0.],
#    [1., 0., 0.],
#    [0., 1., 0.],
#    [0., 1., 0.],
#    [0., 0., 1.]]
# )

我们只需要为索引数组获取一个热元素，然后将其与值进行矩阵相乘，就可以得到我们想要的结果。

values = np.array([1,2,3,4,5])
indexes = np.array([0,0,1,1,2])

one_hot = np.eye(np.max(indexes) + 1)[indexes]

counts = np.sum(one_hot, axis=0)
average = np.sum((one_hot.T * values), axis=1) / counts

print(average) # [1.5 3.5 5.]

赞(0）回复(0）举报 2022-12-26

icnyk63a3#

最简单的解决方案：

values = np.array([1,2,3,4,5])
indexes = np.array([0,0,1,1,2])
index_set = set(indexes) # index_set = {0, 1, 2}

# Now get values based on the index that we saved in index_set 
# and then take an average
avg = [np.mean(values[indexes==k]) for k in index_set]

print(avg) # [1.5, 3.5, 5.0]

赞(0）回复(0）举报 2022-12-26

我来回答

Python/Numpy基于索引获取数组的平均值

3条答案

相关问题

热门标签

最新问答