python-3.x 基于groupby将panda Dataframe 列转换为多列

piv4azn7  于 2022-11-19  发布在  Python
关注(0)|答案(1)|浏览(147)

我有两列rider_IDperson_ID的Pandas Dataframe ,如下所示:

ride_ID  person_ID  
 ride_1    person1   
 ride_1    person2    
 ride_1    person3    
 ride_2    person1    
 ride_2    person4    
 ride_3    person1    
 ride_3    person5    
 ride_3    person2    
 ride_3    person3  
 .....     ......
 .....     ......

对于每个唯一的ride_IDperson_ID的数目可以是2、20或100中的任何一个。总之,我想对列ride_ID应用groupby,这样列person_ID将反映到多个列中,列名称为person_ID1person_IDn。预期输出为:

ride_ID  person_ID1 person_ID2   person_ID3   person_ID4   person_ID5 ....... person_IDn 

 ride_1   person1    person2      person3      NaN         NaN        ......                           
 ride_2   person1    NaN          NaN          person4     NaN        ......     
 ride_3   person1    person2      person3      NaN         person5
kpbwa7wx

kpbwa7wx1#

您可以使用pivot()。为此,请为每个“乘车标识”类型创建一个列“person_IDx”,其中的值以“person_ID1,person_ID2,...,person_IDn”的顺序表示:

df = pd.DataFrame(data=[["ride_1","person1"],["ride_1","person2"],["ride_1","person3"],["ride_2","person1"],["ride_2","person4"],["ride_3","person1"],["ride_3","person5"],["ride_3","person2"],["ride_3","person3"]], columns=["ride_ID","person_ID"])

df["person_IDx"] = 1

df["person_IDx"] = df.groupby("ride_ID")["person_IDx"].transform("cumsum").apply(lambda x: f"person_ID{x}")

df = df.pivot(index="ride_ID", columns="person_IDx", values="person_ID").reset_index().rename_axis(columns={"person_IDx":""})

[Out]:
  ride_ID person_ID1 person_ID2 person_ID3 person_ID4
0  ride_1    person1    person2    person3        NaN
1  ride_2    person1    person4        NaN        NaN
2  ride_3    person1    person5    person2    person3

相关问题