基于重新排列的标签的两个列表重新排列2D NumPy阵列

9rygscc1  于 2023-05-22  发布在  其他
关注(0)|答案(3)|浏览(99)

我有一个函数,它接受数字标签的一维列表,并返回一个二维数组(每个标签对应多个一维数组)。问题是返回的数据是基于标签的排序列表排序的。我需要将它恢复到标签的原始顺序。
例如,我有以下内容:

import numpy as np

labels = [20,12,11]         # desired order
labels_sorted = [11,12,20]  # sorted label order

#              labels:    11     12     20                          20     12     11
data_sorted = np.array([[345.3, 361.8 ,347.6],       # reverted: [[347.6, 361.8, 345.3]
                        [383.6, 402.0, 386.2 ],      #            [386.2, 402.0, 383.6]
                        [422.0, 442.2, 424.9 ],      #            [424.9, 442.2, 422.0]
                        [460.4, 482.4, 463.5 ],      #            [463.5, 482.4, 460.4]
                        [498.7, 522.5, 502.1 ]])     #            [502.1, 522.5, 498.7]]

在这种情况下,我想要的输出是交换第1列和第3列。我设法找到了一个解决方案,得到了我想要的结果。但它主要使用列表操作。我担心它对于较大的数组(例如,1000x1000)会很慢。有没有可能用NumPy函数更有效地做到这一点?

data_sorted_T = np.transpose(data_sorted)  # transpose array so it can be zipped correctly
combined_sorted = zip(labels_sorted, data_sorted_T)  # pair the labels with each data set
combined_reverted = sorted(combined_sorted, key=lambda s: labels.index(s[0]))  # rearrange order
#data_T = np.fromiter( [label[1] for label in combined_reverted], float)  # doesn't work
data_T = np.array([label[1] for label in combined_reverted])  # unzip
data = np.transpose(data_T)

print(labels_sorted)
print(data_sorted)
print(labels)
print(data)
twh00eeo

twh00eeo1#

得到一个排序索引数组:

In [64]: idx=np.argsort(labels); idx
Out[64]: array([2, 1, 0], dtype=int64)

将其应用于labels

In [65]: np.array(labels_sorted)[idx]
Out[65]: array([20, 12, 11])

和数据的列

In [66]: data_sorted[:, idx]
Out[66]: 
array([[347.6, 361.8, 345.3],
       [386.2, 402. , 383.6],
       [424.9, 442.2, 422. ],
       [463.5, 482.4, 460.4],
       [502.1, 522.5, 498.7]])

不需要使用argsort;指定标签和列顺序的任何内容。

70gysomp

70gysomp2#

你可以做的是传递index list来切片2d数组。
下面是没有任何其他库的一行解决方案。

import numpy as np

labels = [20,12,11]         # desired order
labels_sorted = [11,12,20]  # sorted label order

#              labels:    11     12     20                             20     12     11
data_sorted = np.array([[345.3, 361.8 ,347.6],          # reverted: [[347.6, 361.8, 345.3]
                        [383.6, 402.0, 386.2 ],         #            [386.2, 402.0, 383.6]
                        [422.0, 442.2, 424.9 ],         #            [424.9, 442.2, 422.0]
                        [460.4, 482.4, 463.5 ],         #            [463.5, 482.4, 460.4]
                        [498.7, 522.5, 502.1 ]])

# # idea (description of the below answer)
# index_list = []
# for label in labels:
#     index_list.append(labels_sorted.index(label))
# data_sorted[:, index_list]

# ----------------
#     Solution
# ----------------
data_sorted[:, [labels_sorted.index(label) for label in labels]]
laawzig2

laawzig23#

我不认为“标签”的概念在NumPy的情况下有多大意义,所以我想最好的想法是简单地使用正确的工具来完成这项工作,在这种情况下就是pandas:

import numpy as np
import pandas as pd

labels = [20,12,11]         # desired order
labels_sorted = [11,12,20]  # sorted label order

#              labels:    11     12     20
data_sorted = np.array([[345.3, 361.8 ,347.6],          
                        [383.6, 402.0, 386.2 ],         
                        [422.0, 442.2, 424.9 ],         
                        [460.4, 482.4, 463.5 ],         
                        [498.7, 522.5, 502.1 ]]) 

res = pd.DataFrame(data_sorted, columns=[str(i) for i in labels_sorted])
          .reindex(columns=[str(i) for i in labels]).values

Out:
array([[347.6, 361.8, 345.3],
       [386.2, 402. , 383.6],
       [424.9, 442.2, 422. ],
       [463.5, 482.4, 460.4],
       [502.1, 522.5, 498.7]])

同样,为了获得与Numpy相似或更好的性能,您可以使用Polars:

import polars as pl

res = pl.DataFrame(data_sorted, schema=[str(i) for i in labels_sorted])
         .select(pl.col(str(i) for i in labels)).to_numpy()

相关问题