pandas 将一行转换为包含多行的数据框

du7egjpx  于 2022-12-28  发布在  其他
关注(0)|答案(5)|浏览(170)

我有一个包含一行的数据框:

df_1D = pd.DataFrame({'Day1':[5],
                      'Day2':[6],
                      'Day3':[7],
                   'ID':['AB12'],
                    'Country':['US'],
                    'Destination_A':['Miami'],
                     'Destination_B':['New York'],
                      'Destination_C':['Chicago'],
                    'First_Agent':['Jim'],
                      'Second_Agent':['Ron'],
                      'Third_Agent':['Cynthia']},
                       )

  Day1  Day2  Day3    ID  ... Destination_C First_Agent Second_Agent Third_Agent
0     5     6     7  AB12  ...       Chicago         Jim          Ron     Cynthia

我想知道是否有一种简单的方法,将其转换为具有三行的 Dataframe ,如下所示:

Day    ID Country Destination   Agent
0    5  AB12      US       Miami     Jim
1    6  AB12      US    New York     Ron
2    7  AB12      US     Chicago  Cynthia
iecba09b

iecba09b2#

一个选项使用整形,它只需要知道最后的列:

# define final columns
cols = ['Day', 'ID', 'Destination', 'Country', 'Agent']

# the part below is automatic
# ------
# extract the keywords
pattern = f"({'|'.join(cols)})"
new = df_1D.columns.str.extract(pattern)[0]

# and reshape
out = (df_1D
 .set_axis(pd.MultiIndex.from_arrays([new, new.groupby(new).cumcount()]), axis=1)
 .loc[0].unstack(0).ffill()[cols]
)

输出:

Day    ID Destination Country    Agent
0    5  AB12       Miami      US      Jim
1    6  AB12    New York      US      Ron
2    7  AB12     Chicago      US  Cynthia
单独定义idx/cols的替代方法
idx = ['ID', 'Country']
cols = ['Day', 'Destination', 'Agent']

df2 = df_1D.set_index(idx)

pattern = f"({'|'.join(cols)})"
new = df2.columns.str.extract(pattern)[0]

out = (df2
 .set_axis(pd.MultiIndex.from_arrays([new, new.groupby(new).cumcount().astype(str)],
                                     names=[None, None]),
           axis=1)
 .stack().reset_index(idx)
)
0wi1tuuw

0wi1tuuw3#

clomuns_day=[col for col in df_1D if col.startswith('Day')]
clomuns_dest=[col for col in df_1D if col.startswith('Destination')]
clomuns_agent=[col for col in df_1D if 'Agent'in col]
 
new_df=pd.DataFrame()
new_df['Day']=df_1D[clomuns_day].values.tolist()[0]
new_df['ID']= list(df_1D['ID'])*len(new_df)
new_df['Country']= list(df_1D['Country'])*len(new_df)
new_df['Destination']=df_1D[clomuns_dest].values.tolist()[0]
new_df['Agent']=df_1D[clomuns_agent].values.tolist()[0]

输出:

Day    ID Country Destination    Agent
0    5  AB12      US       Miami      Jim
1    6  AB12      US    New York      Ron
2    7  AB12      US     Chicago  Cynthia

无论目的地是什么,您都可以使用它

3htmauhk

3htmauhk4#

一个选项是pivot_longger from pyjanitor,在这种情况下,您将一个正则表达式列表传递给names_pattern,并将新的列名传递给names_to

# pip install pyjanitor
import janitor
import pandas as pd
(df_1D
.pivot_longer(
    index=['ID','Country'], 
    names_to = ['Day','Destination','Agent'], 
    names_pattern=['Day','Destination','Agent'])
)
     ID Country  Day Destination    Agent
0  AB12      US    5       Miami      Jim
1  AB12      US    6    New York      Ron
2  AB12      US    7     Chicago  Cynthia
bgtovc5b

bgtovc5b5#

我不认为有一种方法可以完全自动化地处理这个问题。它需要手动操作。这是我想到的最短的代码。请随意评论:

d1 = {}

for k in ['Day', 'Destination', 'Agent']:
    d1[k] = [d[i][0] for i in d.keys() if k in i]

for k in ['ID', 'Country']:
    d1[k] = d[k] * len(d1['Day'])

d1 = pd.DataFrame(d1)

输出:

希望这能有所帮助。

相关问题