如果列名在列表中,则Pandas按列的可变长度分组

tv6aics1  于 12个月前  发布在  其他
关注(0)|答案(2)|浏览(121)

用户可以选择是否要在groupby语句中包含列或不包含列。如果他们选择包含列,则列名将添加到列表中,因此列表的范围可以是1-6个值。有三个附加列将始终在group by语句中使用,但用户选择的列将具有可变长度
到目前为止,我已经尝试了以下所有导致错误

categorical_fields = []
card_name_match=input("Do you want to include card name first name matches passenger first name: y/n")
if card_name_match=="y":
    categorical_fields.append("name_match")
fare_class_1_cat=input("Do you want to include Fare Class 1 Category: y/n")
if fare_class_1_cat=="y":
    categorical_fields.append("FARECLASS1_cat")
fare_class_2_cat=input("Do you want to include Fare Class 2 Category: y/n")
if fare_class_2_cat=="y":
    categorical_fields.append("FARECLASS2_cat")
distance_cat=input("Do you want to include Distance: y/n")
if distance_cat=="y":
    categorical_fields.append("distance_category")
int_or_domestic=input("Do you want to include if flight was international or domestic: y/n")
if int_or_domestic=="y":
    categorical_fields.append("international_or_domestic")
journey_type=input("Do you want to include journey type of one way, round trip, or different 2nd arrival destination: y/n")
if journey_type=="y":
    categorical_fields.append("dep_to_arr")
airline_score = airline.groupby([categorical_fields,'category','score','mop']).agg(count=('fs_sham','count'),dollars=('fs_dollars','sum')).reset_index()
ValueError: Grouper and axis must be same length
categorical_fields.extend(['category','score','mop'])
group_columns = airline.groupby(categorical_fields)
airline_score = airline.groupby(group_columns).agg(count=('fs_sham','count'),dollars=('fs_dollars','sum')).reset_index()
ValueError: Grouper for '<class 'pandas.core.groupby.generic.DataFrameGroupBy'>' not 1-dimensional
categorical_fields.extend(['category','score','mop'])
airline_score = airline.groupby(airline.columns.isin([categorical_fields])).agg(count=('fs_sham','count'),dollars=('fs_dollars','sum')).reset_index()
ValueError: Grouper and axis must be same length
z9smfwbn

z9smfwbn1#

看看你的 airline_score grouper:

[categorical_fields,'category','score','mop']

它是什么形状的?就像这样:

[ ['categorical_field1', 'categorical_field2'], 'category','score','mop']

这不是1D列表。相反,构建一个实际的1D grouper:

grouper = []
grouper.extend(categorical_fields)
grouper.extend(['category','score','mop'])

airline.groupby(by = grouper).agg( # remainder of your line
aelbi1ox

aelbi1ox2#

IIUC的主要问题是,您正在使用列(* 由用户选择 *)-即使在扩展之后,其长度也恰好小于DataFrame airline列-作为groupby的分组器。这就是为什么这个ValueError被触发。
我将给予一个可能的修复建议,并尝试使您的代码更清晰:

d = {
    "name_match": "Do you want to include card name first name matches passenger first name? (y/n): ",
    "FARECLASS1_cat": "Do you want to include Fare Class 1 Category? (y/n): ",
    "FARECLASS2_cat": "Do you want to include Fare Class 2 Category? (y/n): ",
    "distance_category": "Do you want to include Distance? (y/n): ",
    "international_or_domestic": "Do you want to include if flight was international or domestic? (y/n): ",
    "dep_to_arr": "Do you want to include journey type of one way, round trip, or different 2nd arrival destination? (y/n): "
}

categorical_fields = [c for c, p in d.items() if input(p).lower() == "y"]

grouper = (
    df.columns.intersection(categorical_fields)
        .union(["category", "score", "mop", "fs_dollars", "fs_sham"])
)

airline_score = (
    airline.groupby(list(grouper), as_index=False)
        .agg(count=("fs_sham","count"), dollars=("fs_dollars", "sum"))
)

相关问题