pandas 更改的id计数生成错误值

0lvr5msh  于 2023-06-20  发布在  其他
关注(0)|答案(2)|浏览(104)

我有一个df,看起来像这样:

import pandas as pd

data = {
    'api_spec_id': [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    'type': ['BR', 'BR', 'NBR', 'NBR', 'NBR', 'NBR', 'BR', 'BR', 'BR', 'BR', 'BR']
}

df = pd.DataFrame(data)

我试着计算4种情况,一种是api_spec_id中的所有行都是,type= BR,第二种是api_spec_id中的至少一行,the type is BR
这是我正在使用的代码,但它似乎是错误的,因为它为最后两个生成了相同的输出:

import pandas as pd

at_least_one_breaking_change = df[df['type'] == 'BR']['api_spec_id'].nunique()

all_commits_including_breaking = df.groupby('api_spec_id').apply(lambda x: 'NBR' not in x['type'].unique()) \
                                .sum()

at_least_one_non_breaking_change = df[df['type'] == 'NBR']['api_spec_id'].nunique()

all_commits_including_non_breaking = df.groupby('api_spec_id').apply(lambda x: 'BR' not in x['type'].unique()) \
                                    .sum()

我发送的示例df的预期输出将是:

at_least_one_breaking_change = 3
all_commits_including_breaking = 3
at_least_one_non_breaking_change = 2
all_commits_including_non_breaking = 1

我在这方面有点卡住了,任何建议或想法都会非常感激。

q35jwt9p

q35jwt9p1#

你可以使用pd.crosstab

m = pd.crosstab(df['api_spec_id'], df['type']).astype(bool)

at_least_one_breaking_change = sum(m['BR'])
all_commits_including_breaking = sum(m['BR'] & ~m['NBR'])

at_least_one_non_breaking_change = sum(m['NBR'])
all_commits_including_non_breaking = sum(m['NBR'] & ~m['BR'])

输出:

>>> at_least_one_breaking_change
3

>>> all_commits_including_breaking
2

>>> at_least_one_non_breaking_change
2

>>> all_commits_including_non_breaking
1

>>> m
type            BR    NBR
api_spec_id              
123           True  False
213           True   True
345          False   True
678           True  False
smdnsysy

smdnsysy2#

我看过并运行了你的代码,它的输出是:

此代码中的条件有点错误。
看看更新

import pandas as pd

at_least_one_breaking_change = df[df['type'] == 'BR']['api_spec_id'].nunique()

all_commits_including_breaking = df.groupby('api_spec_id').apply(lambda x: 'NBR' in x['type'].unique()) \
                                .sum()

at_least_one_non_breaking_change = df[df['type'] == 'NBR']['api_spec_id'].nunique()

all_commits_including_non_breaking = df.groupby('api_spec_id').apply(lambda x: 'Breaking' in x['type'].unique()) \
                                    .sum()

此外,没有"Breaking"类型。

相关问题