python-3.x 循环编码numpy数组中的分类变量

ccgok5k5 于 2023-03-13 发布在 Python

关注(0)|答案(1)|浏览(111)

我有一个包含分类变量的numpy数组列表，其中我使用了标签编码来编码数组中的这些变量，但我的编码结果是意料之外的，如下所示

import numpy as np
from sklearn.preprocessing import LabelEncoder

bin_ids = [np.array(['Horses', 'Cats', 'Dogs']), np.array(['Blue', 'Green', 'Red'])]

encoder  = LabelEncoder()

bin_ids_arr = []

for bin_id in bin_ids:
# flatten the array and encode the categorical values
  bin_ids_enc = encoder.fit_transform(bin_id.flatten()).reshape(bin_id.shape)
  bin_ids_arr.append(bin_ids_enc)
  
bin_ids_arr = np.array(bin_ids_arr)

print(bin_ids_arr)

Actual Output : [[2 0 1]
                [0 1 2]]

然而，我希望输出如下所示，其中每个不同的分类变量被连续分配一个不同的数字。

Expected Output : [[0 1 2]
                [3 4 5]]

有没有一种方法可以得到像上面输出的编码？

python-3.x

来源：https://stackoverflow.com/questions/75713206/encoding-categorical-variables-in-a-numpy-array-through-loop

1条答案

按热度按时间

myzjeezk1#

如果你想把bin_ids当作一个连续的类别列表--它需要被扁平化并传递给LabelEncoder.fit_transform的一个调用。要获得原始数组形状的类别/标签，使用numpy.reshape：

bin_cats = encoder.fit_transform(bin_ids.ravel()).reshape(bin_ids.shape)
print(bin_cats)

[[4 1 2]
 [0 3 5]]

注意，上述结果是无序值，对于编码非数字标签LabelEncoder，需要确保输入标签是可散列的和可比较的，从而按字典顺序进行比较。

赞(0）回复(0）举报 2023-03-13

我来回答

python-3.x 循环编码numpy数组中的分类变量

1条答案

相关问题

热门标签

最新问答