如何在pandas groupby中自定义按值排序列

vybvopom  于 2023-06-20  发布在  其他
关注(0)|答案(1)|浏览(78)

我有一个dataframe,我尝试从Groupby导出到Excel的结果由自定义排序值一个,两个或更多列:

  • id排序方式['bv','cl','an']
  • 颜色按['blue','red','orange','green']排序
import pandas as pd
data = {'color': ['green', 'red', ' orange', 'red','blue',' orange', 'red','blue', 'red', ' orange', 'red','red','green'],
        'Name': ['Tom', 'nick', 'krish', 'jack','bob','Tom', 'nick', 'krish', 'nick', 'krish', 'jack','nick','Tom'],
        'form': [ 'a', 'b', 'c', 'd','e','b', 'c', 'd','d','e','b', 'c', 'b'],
        'id': [ 'an', 'bv', 'cl', 'cl','an','bv', 'an', 'cl','cl', 'an','bv','bv','an'],
        }

df = pd.DataFrame(data)

print(df)

df_1 = df.groupby(['color','Name','id'])['form'].agg('count')
print(df_1)

writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')    
df_1.to_excel(writer, sheet_name='Sheet1')
writer.save()

print(' I try the df to excel have custom sort values in columns')
print('id sort by [bv,cl,an]')
print('color sort by [blue,red,orange,green]')
print('how can make it?')
b1zrtrql

b1zrtrql1#

你可以先使用Categorical Data,然后使用sort_index

ids = pd.CategoricalDtype(["bv", "cl", "an"], ordered=True)
colors = pd.CategoricalDtype(["blue", "red", "orange", "green"], ordered=True)
#add here more categories if needed

out = (
    df.astype({"id": ids, "color": colors})
        .groupby(["color", "Name", "id"], observed=True)["form"].agg("count")
        .sort_index(level=[0, 2]) #in case, add here more levels
)

with pd.ExcelWriter("test.xlsx", engine="xlsxwriter") as writer:
    out.to_excel(writer, sheet_name="Sheet1")
    # add here a custom styling/formatting if needed

输出:

print(out)

color   Name   id
blue    krish  cl    1
        bob    an    1
red     jack   bv    1
        nick   bv    2
        jack   cl    1
        nick   cl    1
               an    1
orange  Tom    bv    1
        krish  cl    1
               an    1
green   Tom    an    2
Name: form, dtype: int64

相关问题