pandas 将一行转换为包含多行的数据框

du7egjpx 于 2022-12-28 发布在其他

关注(0)|答案(5)|浏览(169)

我有一个包含一行的数据框：

df_1D = pd.DataFrame({'Day1':[5],
                      'Day2':[6],
                      'Day3':[7],
                   'ID':['AB12'],
                    'Country':['US'],
                    'Destination_A':['Miami'],
                     'Destination_B':['New York'],
                      'Destination_C':['Chicago'],
                    'First_Agent':['Jim'],
                      'Second_Agent':['Ron'],
                      'Third_Agent':['Cynthia']},
                       )

  Day1  Day2  Day3    ID  ... Destination_C First_Agent Second_Agent Third_Agent
0     5     6     7  AB12  ...       Chicago         Jim          Ron     Cynthia

我想知道是否有一种简单的方法，将其转换为具有三行的 Dataframe ，如下所示：

Day    ID Country Destination   Agent
0    5  AB12      US       Miami     Jim
1    6  AB12      US    New York     Ron
2    7  AB12      US     Chicago  Cynthia

pandas

来源：https://stackoverflow.com/questions/74885555/transform-one-row-to-a-data-frame-with-multiple-rows

5条答案

按热度按时间

nnsrf1az1#

您是否尝试过使用.pivot函数透视它？https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html

赞(0）回复(0）举报 2022-12-28

iecba09b2#

一个选项使用整形，它只需要知道最后的列：

# define final columns
cols = ['Day', 'ID', 'Destination', 'Country', 'Agent']

# the part below is automatic
# ------
# extract the keywords
pattern = f"({'|'.join(cols)})"
new = df_1D.columns.str.extract(pattern)[0]

# and reshape
out = (df_1D
 .set_axis(pd.MultiIndex.from_arrays([new, new.groupby(new).cumcount()]), axis=1)
 .loc[0].unstack(0).ffill()[cols]
)

输出：

Day    ID Destination Country    Agent
0    5  AB12       Miami      US      Jim
1    6  AB12    New York      US      Ron
2    7  AB12     Chicago      US  Cynthia

单独定义idx/cols的替代方法

idx = ['ID', 'Country']
cols = ['Day', 'Destination', 'Agent']

df2 = df_1D.set_index(idx)

pattern = f"({'|'.join(cols)})"
new = df2.columns.str.extract(pattern)[0]

out = (df2
 .set_axis(pd.MultiIndex.from_arrays([new, new.groupby(new).cumcount().astype(str)],
                                     names=[None, None]),
           axis=1)
 .stack().reset_index(idx)
)

赞(0）回复(0）举报 2022-12-28

0wi1tuuw3#

clomuns_day=[col for col in df_1D if col.startswith('Day')]
clomuns_dest=[col for col in df_1D if col.startswith('Destination')]
clomuns_agent=[col for col in df_1D if 'Agent'in col]
 
new_df=pd.DataFrame()
new_df['Day']=df_1D[clomuns_day].values.tolist()[0]
new_df['ID']= list(df_1D['ID'])*len(new_df)
new_df['Country']= list(df_1D['Country'])*len(new_df)
new_df['Destination']=df_1D[clomuns_dest].values.tolist()[0]
new_df['Agent']=df_1D[clomuns_agent].values.tolist()[0]

输出：

Day    ID Country Destination    Agent
0    5  AB12      US       Miami      Jim
1    6  AB12      US    New York      Ron
2    7  AB12      US     Chicago  Cynthia

无论目的地是什么，您都可以使用它

赞(0）回复(0）举报 2022-12-28

3htmauhk4#

一个选项是pivot_longger from pyjanitor，在这种情况下，您将一个正则表达式列表传递给names_pattern，并将新的列名传递给names_to：

# pip install pyjanitor
import janitor
import pandas as pd
(df_1D
.pivot_longer(
    index=['ID','Country'], 
    names_to = ['Day','Destination','Agent'], 
    names_pattern=['Day','Destination','Agent'])
)
     ID Country  Day Destination    Agent
0  AB12      US    5       Miami      Jim
1  AB12      US    6    New York      Ron
2  AB12      US    7     Chicago  Cynthia

赞(0）回复(0）举报 2022-12-28

bgtovc5b5#

我不认为有一种方法可以完全自动化地处理这个问题。它需要手动操作。这是我想到的最短的代码。请随意评论：

d1 = {}

for k in ['Day', 'Destination', 'Agent']:
    d1[k] = [d[i][0] for i in d.keys() if k in i]

for k in ['ID', 'Country']:
    d1[k] = d[k] * len(d1['Day'])

d1 = pd.DataFrame(d1)

输出：

希望这能有所帮助。

赞(0）回复(0）举报 2022-12-28

我来回答

pandas 将一行转换为包含多行的数据框

5条答案

单独定义idx/cols的替代方法

相关问题

热门标签

最新问答