透视/透视表如何对具有多个列值的 Dataframe 进行Pandas运算

b5lpy0ml  于 2022-11-27  发布在  其他
关注(0)|答案(3)|浏览(147)

我有数据框。“

data = pd.DataFrame([['Benz', 'MinSpeed', 0, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'TopSpeed', 200, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'ChasisNum', 654121, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'Seats', 5, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'AirBags', 5, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'VehicleType', np.nan, 'Sedan', 'USA', '2022-08-12'],
                     ['Benz', 'Color', np.nan, 'Black','USA', '2022-08-12'],
                     ['Benz', 'InternetInside', np.nan, 'Yes','USA', '2022-08-12'],
                     
                     ['Ferrari', 'MinSpeed', 0, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'TopSpeed', 250, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'ChasisNum', 781121, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'Seats', 4, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'AirBags', 2, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'VehicleType', np.nan, 'SUV', 'France', '2022-12-25'],
                     ['Ferrari', 'Color', np.nan, 'Red','France', '2022-12-25'],
                     ['Ferrari', 'InternetInside', np.nan, 'No','France', '2022-12-25'],
                     ], 
                    columns= ['CarModel', 'Features', 'NumericalValues', 'CategoricalValues','Country', 'DeliveryDate'])

`
我尝试使用透视函数透视数据,但得到的“NumericalValues”和“CategoricalValues”值列重复
代码:

data.pivot(index='CarModel', columns='Features', values=['NumericalValues','CategoricalValues' ]).reset_index()

我需要的预期输出为:

output_data = pd.DataFrame([['Benz', 0, 200, 654121, 5, 5, 'Sedan', 'Black', 'Yes', 'USA', '2022-08-12'],
                         ['Ferrari', 0, 250, 781121, 4, 2, 'SUV', 'Red', 'No', 'France', '2022-12-25']
                     ],
                    columns=['CarModel', 'MinSpeed', 'TopSpeed', 'ChasisNum','Seats', 'AirBags', 'VehicleType', 'Color', 'InternetInside', 'Country', 'DeliveryDate'])

'我也尝试使用数据透视表,但无法获得此输出。

ee7vknir

ee7vknir1#

您可以执行pivot,然后在列上运行groupby.first以删除不需要的列:

out = (data
  .pivot(index=['CarModel', 'Country', 'DeliveryDate'],
         columns='Features'
        )
  .groupby(level='Features', axis=1).first()
  .reset_index()
)

输出量:

Features CarModel Country DeliveryDate  AirBags  ChasisNum  Color InternetInside  MinSpeed  Seats  TopSpeed VehicleType
0            Benz     USA   2022-08-12      5.0   654121.0  Black            Yes       0.0    5.0     200.0       Sedan
1         Ferrari  France   2022-12-25      2.0   781121.0    Red             No       0.0    4.0     250.0         SUV

其优点是可以维护数据类型:

Features
CarModel           object
Country            object
DeliveryDate       object
AirBags           float64
ChasisNum         float64
Color              object
InternetInside     object
MinSpeed          float64
Seats             float64
TopSpeed          float64
VehicleType        object
dtype: object
uurv41yg

uurv41yg2#

从您的数据中,因为您在'NumericalValues'或'CategoricalValues'中有一个值,所以您可以创建一个列,用fillna合并这两个列中的所有信息,然后像以前那样使用pivot

res = (
    data.assign(Values=lambda x: x['NumericalValues'].fillna(x['CategoricalValues']))
      .pivot(index='CarModel', columns='Features', values='Values')
      .reset_index().rename_axis(columns=None)
)
print(res)
#   CarModel AirBags ChasisNum  Color InternetInside MinSpeed Seats TopSpeed  \
# 0     Benz     5.0  654121.0  Black            Yes      0.0   5.0    200.0   
# 1  Ferrari     2.0  781121.0    Red             No      0.0   4.0    250.0   

#   VehicleType  
# 0       Sedan  
# 1         SUV
ovfsdjhp

ovfsdjhp3#

另一种可能的解决方案是使用pandas.pivot_table

out = (data.pivot_table(
    index=['CarModel',  'Country', 'DeliveryDate'], 
    columns='Features', values=['NumericalValues', 'CategoricalValues'],
    aggfunc=max)
       .droplevel(0, axis=1)
       .rename_axis(None, axis=1)
       .reset_index())

输出量:

CarModel Country DeliveryDate  Color InternetInside VehicleType  AirBags  \
0     Benz     USA   2022-08-12  Black            Yes       Sedan      5.0   
1  Ferrari  France   2022-12-25    Red             No         SUV      2.0   

   ChasisNum  MinSpeed  Seats  TopSpeed  
0   654121.0       0.0    5.0     200.0  
1   781121.0       0.0    4.0     250.0

数据类型:

CarModel           object
Country            object
DeliveryDate       object
Color              object
InternetInside     object
VehicleType        object
AirBags           float64
ChasisNum         float64
MinSpeed          float64
Seats             float64
TopSpeed          float64

相关问题