将numpy数组加载到单个Dataframe列

vybvopom 于 2021-05-24 发布在 Spark

关注(0)|答案(2)|浏览(519)

我正在使用pyspark并尝试使用csv来存储我的数据。我把我的numpy数组转换成一个Dataframe，格式如下：

label   |     0    1     2     4    ...    768
---------------------------------------
  1     |   0.12  0.23  0.31  0.72  ...   0.91

以此类推，将数组中“行向量”本身的每个值拆分为单独的列。这种格式与spark不兼容，它需要 features 全部在一列中。有没有一种方法可以将数组加载到这种格式的Dataframe中？例如：

label   |     Features
------------------------------------------
  1     |   [0.12,0.23,0.31,0.72,...,0.91]

我试着遵循这个线程的建议，其中详细说明了使用sparkapi合并列，但是当加载标签时，我得到了一个错误，因为标签成为向量的一部分，而不是一个向量 string 或者 int 价值观。

python apache-spark pandas Arrays numpy

来源：https://stackoverflow.com/questions/64146610/loading-numpy-array-to-single-pandas-dataframe-colums

2条答案

按热度按时间

rdrgkggo1#

我对spark一无所知，但你们中的大多数人想要一个包含一列列表的Dataframe df['features'] = SOME_2D_LIST_OF_LISTS ```
data = [[1,2,3],[4,5,6],[7,8,9]]
df = pd.DataFrame()
df['Features'] = data # now you have a column of lists

If for whatever reason you want each row value to itself be a numpy array add

df['Features'] = df['Features'].map(np.array)

如果数据已经是numpy数组 `df['Features'] = data.tolist()` .

赞(0）回复(0）举报 2021-05-25

gijlo24d2#

要做到这一点，请注意，我决定在浮点上使用整数以提高可读性：

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(20, 30, size=30).reshape(3, 10))
df.insert(0, "label", [1,2,3])

print(df)

   label   0   1   2   3   4   5   6   7   8   9
0      1  26  27  25  29  20  23  26  25  22  23
1      2  20  20  26  25  23  23  26  24  27  23
2      3  24  22  24  22  26  23  27  22  26  23

选择所有功能列（我使用的 iloc 这里）并将它们转换为列表列表。

features = df.iloc[:, 1:].to_numpy().tolist()

print(features)
[[26, 27, 25, 29, 20, 23, 26, 25, 22, 23],
 [20, 20, 26, 25, 23, 23, 26, 24, 27, 23],
 [24, 22, 24, 22, 26, 23, 27, 22, 26, 23]]

然后用标签和新功能创建一个新的数据框：

new_df = pd.DataFrame({
    "label": df["label"],
    "features": features
})

print(new_df)

   label                                  features
0      1  [26, 27, 25, 29, 20, 23, 26, 25, 22, 23]
1      2  [20, 20, 26, 25, 23, 23, 26, 24, 27, 23]
2      3  [24, 22, 24, 22, 26, 23, 27, 22, 26, 23]

赞(0）回复(0）举报 2021-05-25

我来回答

将numpy数组加载到单个Dataframe列

2条答案

If for whatever reason you want each row value to itself be a numpy array add

相关问题

热门标签

最新问答