拉平Pandas指数

uqjltbpv 于 2022-11-20 发布在其他

关注(0)|答案(1)|浏览(121)

我有一个如下所示的数据框

| id  | label |
|0| 1   | foo   |
|1| 2   | baa   |
|2| 1   | baa   |

我希望它变成这样的结构

| id  | foo| baa
|0| 1   |   1| 1
|1| 2   |   0| 1

我以前

df = pd.DataFrame({'id':[1,2,4,1,1,2], 'label':['foo', 'ba', 'foo', 'baa','coo','coo']})
df = pd.crosstab(df.id, df.key)

但它给出了一个带有奇怪索引的df

1条答案

您可以使用pivot table。您只需要添加一个列作为值列。
但是，您可以添加一个“count”列，每行的值仅为1，然后在数据透视表中使用count aggfunc。
就像这样：

df['count'] = 1
pd.pivot_table(df,index='id',columns='label', values='count').fillna(0)

输出量：

label   baa foo
id      
1       1.0 1.0
2       1.0 0.0

编辑-抱歉意识到它看起来像你想要的ID作为一个列，而不是索引。
您可以使用chain .reset_index（）将id移动到单独的列中：

pd.pivot_table(df,index='id',columns='label', values='count').fillna(0).reset_index()

提供您所需的输出：

label   id  baa foo
0   1   1.0 1.0
1   2   1.0 0.0