基于列名Pandas的转置

tvmytwxo  于 2023-05-05  发布在  其他
关注(0)|答案(4)|浏览(153)

我有一个dataframe,最初是非结构化的JSON格式。我规范化了JSON,但在基于规范化的列结构转置 Dataframe 时遇到了问题
下面是dataframe的样子
| 身份证|名称|attributes_0_color|属性_0_pet|attributes_1_color|属性_1_pet|
| --------------|--------------|--------------|--------------|--------------|--------------|
| 1|约翰|红色|狗|紫色|鹦鹉|
| 二|比尔|蓝色|猫|橙子|仓鼠|
我怎样才能让dataframe看起来像这样?
| 身份证|名称|颜色|宠物|
| --------------|--------------|--------------|--------------|
| 1|约翰|红色|狗|
| 1|约翰|紫色|鹦鹉|
| 二|比尔|蓝色|猫|
| 二|比尔|橙子|仓鼠|

xbp102n0

xbp102n01#

lreshape

out = (pd.lreshape(df, {"color": ["attributes_0_color", "attributes_1_color"],
                       "pet": ["attributes_0_pet", "attributes_1_pet"]})
            .sort_values(by="id")
)

输出: 打印

id  name   color      pet
0   1  John     red      dog
2   1  John  purple   parrot
1   2  Bill    blue      cat
3   2  Bill  orange  hamster
mqkwyuun

mqkwyuun2#

这是一个与pd.wide_to_long非常相似的用例:

temp = df.set_index(['id','name'])

# split the column names by `_`, ignore `attribute` part
# print `temp` after this command to see what it does
temp.columns = pd.MultiIndex.from_tuples(list(map(lambda x: x.split('_')[1:], temp.columns)))

# expected output
out = temp.stack(level=0).reset_index(level=-1, drop=True).reset_index()

输出:

id   name    color      pet
0   1  John      red       dog 
1   1  John   purple    parrot
2   2  Bill     blue       cat 
3   2  Bill   orange   hamster
ghg1uchk

ghg1uchk3#

另一种可能的解决方案:

df.columns = df.columns.str.replace(r'attributes_\d_', '', regex=True)
pd.concat([df.iloc[:, :4], df.iloc[:, [0, 1, 4, 5]]]).sort_values('id')

输出:

id  name   color      pet
0   1  John     red      dog
0   1  John  purple   parrot
1   2  Bill    blue      cat
1   2  Bill  orange  hamster
ki0zmccv

ki0zmccv4#

一个选项是janitor.pivot_longer:

# pip install pyjanitor
import pandas as pd
import janitor

names_pattern = ['color', 'pet']
df.pivot_longer(index = ['id', 'name'], 
                names_to = names_pattern, 
                names_pattern = names_pattern, 
                sort_by_appearance=True)
   id  name   color      pet
0   1  John     red      dog
1   1  John  purple   parrot
2   2  Bill    blue      cat
3   2  Bill  orange  hamster

您也可以使用.value占位符:

df.pivot_longer(index = ['id', 'name'], 
                names_to = '.value', 
                names_pattern = r".+_(.+)", 
                sort_by_appearance=True)
   id  name   color      pet
0   1  John     red      dog
1   1  John  purple   parrot
2   2  Bill    blue      cat
3   2  Bill  orange  hamster

相关问题