pandas 创建一个包含一个热编码的列，为缺少的类别添加空列

aamkag61 于 2023-04-28 发布在其他

关注(0)|答案(4)|浏览(112)

我正在尝试将值从一个df填充到另一个df。
Df1看起来是这样的，具有形状（52，2）：

Providing 1st 5 rows of df1:

     id            months 
71911200001          22
71911200002          27
71911200004          30
71911200003          23
41911200003          35

Df2看起来是这样的形状（52，49）：

Providing 1st 5 rows and columns of df2:

id           M0 M1 M2 M3 M4.....M49
71911200001  0  0  0  0  0       0
71911200002  0  0  0  0  0       0
71911200004  0  0  0  0  0       0
71911200003  0  0  0  0  0       0
41911200003  0  0  0  0  0       0
Note: id is set as row index for this df.

现在我想以如下方式填充Df2：

id           M0 M1....M22 M23...M27...M30...M35..M49
71911200001  0  0      1   0    0     0      0    0
71911200002  0  0      0   0    1     0      0    0
71911200004  0  0      0   0    0     1      0    0
71911200003  0  0      0   1    0     0      0    0
41911200003  0  0      0   0    0     0      1    0

The ids are the same on both the dfs.

基本上，对于df2中的每个id，我想填充“1”，只要df2中列名的数字部分与df1中的月份列中的值相匹配。
注意：所有ID都是唯一的，没有重复。
在上述任何帮助将不胜感激。

pandas

来源：https://stackoverflow.com/questions/76111095/create-a-column-of-one-hot-encodings-add-empty-columns-for-missing-categories

4条答案

按热度按时间

hc2pp10m1#

你真的需要df2吗？我觉得你可以在categorical column上使用pd.get_dummies从df1构造df2。试试这个：

df['months'] = pd.Categorical(
    df['months'], categories=range(50), ordered=True)

df2 = (pd.get_dummies(df.set_index('id')['months'], dtype=np.int8)
         .add_prefix('M'))

df2.shape
# (5, 50)

df2.loc[:, 'M22':'M27']

             M22  M23  M24  M25  M26  M27
id                                       
71911200001    1    0    0    0    0    0
71911200002    0    0    0    0    0    1
71911200004    0    0    0    0    0    0
71911200003    0    1    0    0    0    0
41911200003    0    0    0    0    0    0

df2现在是一个热编码的DataFrame。从M0到M49的每个类别都被表示。

赞(0）回复(0）举报 2023-04-28

nhhxz33t2#

我认为你需要做的就是修改你的df2的列，这些列来自df1months列：

df2[df1["months"].map("M{}".format)] = pd.get_dummies(
    df1["months"], prefix="M", prefix_sep=""
)

将从df1[months]获得的get_dummies中的值分配给这些列

赞(0）回复(0）举报 2023-04-28

ndasle7k3#

编码

out = (pd.get_dummies(df.set_index('id')['months'])
.reindex(columns=range(50), fill_value=0)
.add_prefix('M'))

查看

检查out.loc[:, 'M22':'M27']

M22 M23 M24 M25 M26 M27
id                      
71911200001 1   0   0   0   0   0
71911200002 0   0   0   0   0   1
71911200004 0   0   0   0   0   0
71911200003 0   1   0   0   0   0
41911200003 0   0   0   0   0   0

赞(0）回复(0）举报 2023-04-28

wmvff8tz4#

你可以通过迭代df1的行并根据id和months值更新df2的相应行来实现这一点，试试这个：

for idx, row in df1.iterrows():
    id_value = row['id']
    month_value = row['months']
    df2.loc[id_value, f'M{month_value}'] = 1

赞(0）回复(0）举报 2023-04-28

我来回答

pandas 创建一个包含一个热编码的列，为缺少的类别添加空列

4条答案

相关问题

热门标签

最新问答