pandas 数据过滤

tquggr8v  于 2023-01-01  发布在  其他
关注(0)|答案(2)|浏览(112)

我有以下类型的数据:

df = pd.DataFrame({'Election Yr.':[2000,2000,2000,2000,2000,2000,2005,2005,2005,2005,2005,2005],
                   'Party':['A','A','B','B','C','C','A','A','B','B','C','C',],
                   'Votes':[50,30,40,50,30,40,50,30,40,50,30,40],
                   'Odd':['aa','bb','cc','dd','ee','gg','ff', 'hh', 'jj', 'kk', 'll', 'yy']})

我想按选举年分组,并对选票中的值求和,同时保留与每个唯一的选举年选票最大值对应的政党的值以及奇数的第一个值。
我想出了一个部分解决方案如下:

df = pd.DataFrame(df.groupby(['Election Yr.'], as_index=False).agg({'Votes':'sum', 'Party':'idxmax', 'Odd':'first'}))

Party:'idxmax ' 似乎不正确,因为值是字符串

kqqjbcuj

kqqjbcuj1#

def get_max_votes_info(group):
    # Find the row with the maximum votes
    max_row = group[group['Votes'] == group['Votes'].max()].iloc[0]
    # Return a tuple with the sum of the votes and the party and odd values for the row with the maximum votes
    return (group['Votes'].sum(), max_row['Party'], max_row['Odd'])

# Group the DataFrame by 'Election Yr.' and apply the custom function
df_grouped = df.groupby('Election Yr.').apply(get_max_votes_info)

# The resulting DataFrame has a multi-index, with the election year as the first level and the party and odd values as the second level
# You can reset the index to remove the multi-index if desired
df_result = df_grouped.reset_index()

# The resulting DataFrame should have three columns: 'Election Yr.', 'Votes', and 'Party', with the values you specified
print(df_result)

或尝试:

df_grouped = df.groupby('Election Yr.').agg({
    'Votes': 'sum',
    'Party': lambda x: x.loc[x.idxmax()],  # returns the value of 'Party' for the row with the maximum votes
    'Odd': 'first'  # returns the first value of 'Odd'
})

# The resulting DataFrame has a single index, with the election year as the only level
# You can reset the index to convert the index to a column if desired
df_result = df_grouped.reset_index()

# The resulting DataFrame should have three columns: 'Election Yr.', 'Votes', and 'Party', with the values you specified
print(df_result)

也可以使用它将max函数应用于'Votes'列,然后使用生成的索引选择'Party'列的相应值:

def get_max_votes_info(group):
    # Find the index of the row with the maximum votes
    max_index = group['Votes'].idxmax()
    # Return a tuple with the sum of the votes and the party and odd values for the row with the maximum votes
    return (group['Votes'].sum(), group.loc[max_index, 'Party'], group.loc[max_index, 'Odd'])

df_grouped = df.groupby('Election Yr.').apply(get_max_votes_info)
df_result = df_grouped.reset_index()

# The resulting DataFrame should have three columns: 'Election Yr.', 'Votes', and 'Party', with the values you specified
print(df_result)
xwbd5t1u

xwbd5t1u2#

您可以创建两个 Dataframe :
1.每年投票总数和第一个奇数值
1.与年数最大值匹配的交易方(可以是多个)

df_sum_odd = df.groupby(['Election Yr.'], as_index=False).agg({'Votes':'sum', 'Odd':'first'})
df_party = df[df.groupby('Election Yr.').Votes.transform('max') == df.Votes][['Election Yr.', 'Party']]
df_sum_odd.merge(df_party, on = 'Election Yr.', how='right')

合并这两个变量后,您将收到以下结果:

Election Yr.    Votes   Odd Party
0   2000            240     aa  A
1   2000            240     aa  B
2   2005            240     ff  A
3   2005            240     ff  B

请注意,每个选举年有两行,因为有两个不同的政党与每年的最大选票值(50)相对应。

相关问题