基于匹配的列值重构2D numpy数组

yquaqz18 于 2023-05-17 发布在其他

关注(0)|答案(1)|浏览(178)

我正在处理一个有大约3000万个条目的数据集。每个条目都有一个时间戳、一个ID、一个描述和一个值。整个numpy数组看起来像这样：

[
[Time 1, ID 1, D 1_1, V 1_1],
[Time 1, ID 1, D 1_2, V 1_2],
...
[Time 2, ID 1, D 2_1, V 2_1],
[Time 2, ID 1, D 2_2, V 2_2],
...
[Time X, ID 2, D X_1, V X_1],
...
]

我想把数组压缩成以下格式：

[
[Time 1, ID 1, D 1_1, V 1_1, D 1_2, V 1_2, ...],
[Time 2, ID 1, D 2_1, V 2_1, D 2_2, V 2_2, ...],
[Time X, ID 2, D X_1, V X_1, ...],
...
]

原始数组中的每个子数组将具有相同的长度和顺序，但是具有相同时间戳的子数组的数量是可变的，具有相同ID的子数组的数量也是可变的。是否有办法在合理的时间内重新构建阵列？time、id和description列将是字符串，而value列将是浮点数（如果这很重要的话）。
理想情况下，我可以得到一个字典数组，即。

[{'time': Time1, 'ID': ID1, 'D1_1': V1_1, 'D1_2': V1_2, ...}...]

然而，考虑到我尝试使用字典所花费的时间（>100小时），我假设字典的构建时间太长了。

numpy

来源：https://stackoverflow.com/questions/76222461/restructure-a-2d-numpy-array-based-on-matching-column-values

1条答案

按热度按时间

ffx8fchx1#

我想有了Pandas你就可以轻松实现这个目标：

import pandas as pd

# your dataframe
df = pd.DataFrame(data=your_np_array, columns=['Time', 'ID', 'Description', 'Value'])
# groupby time and ID, and aggregate the descriptions and values into lists
grouped = df.groupby(['Time', 'ID']).agg({'Description': list, 'Value': list})
# reset the index to get the time and ID as columns rather than indices
result = grouped.reset_index()
# convert the lists into separate columns
result['Description'] = result['Description'].apply(lambda x: ','.join(x))
result['Value'] = result['Value'].apply(lambda x: ','.join(map(str, x)))

# convert the result to a numpy array
my_new_numpy_array = result.to_numpy()

赞(0）回复(0）举报 2023-05-17

我来回答

基于匹配的列值重构2D numpy数组

1条答案

相关问题

热门标签

最新问答