是否将Pandas groupby()的结果反馈回原始 Dataframe ?[重复]

nmpmafwu  于 2023-02-14  发布在  其他
关注(0)|答案(1)|浏览(156)
    • 此问题在此处已有答案**:

How to assign a name to the size() column?(5个答案)
4小时前关门了。
如何使用groupby()来获得给定日期的雇员类型计数,并将结果反馈到原始 Dataframe 中?
这是数据

shifts = [("Cashier", "Thursday"), ("Cashier", "Thursday"),
        ("Cashier", "Thursday"), ("Cook", "Thursday"),
        ("Cashier", "Friday"), ("Cashier", "Friday"),
        ("Cook", "Friday"), ("Cook", "Friday"),
        ("Cashier", "Saturday"), ("Cook", "Saturday"),
        ("Cook", "Saturday")]
labels = ["JOB_TITLE", "DAY"]
df = pd.DataFrame.from_records(shifts, columns=labels)

value_counts()的这种用法会产生正确的结果:

shifts_series = df.groupby('DAY')['JOB_TITLE'].value_counts()

那么,如何将这里给出的值反馈回原始DF:

JOB_TITLE   DAY TYPE
0   Cashier Thursday    3
1   Cashier Thursday    3
2   Cashier Thursday    3
3   Cook    Thursday    1
4   Cashier Friday      2
5   Cashier Friday      2
6   Cook    Friday      2
7   Cook    Friday      2
8   Cashier Saturday    1
9   Cook    Saturday    2
10  Cook    Saturday    2

我找到了一些建议使用transform()的答案,但结果只计算'DAY'的示例数:

df.groupby('DAY')['JOB_TITLE'].transform('count')

我设法使用different question的答案创建了一个令人讨厌的小Pandas反模式,我尝试循环结果并标记为[('Saturday', 'Cashier'), ('Thursday', 'Cook')]

shift_filter1 = shifts_series[shifts_series == 1].index.tolist()
df['WORKED_SOLO'] = np.nan
for workday, title in shift_filter1:
    df['WORKED_SOLO'] = (np.where(((df['WORKED_SOLO'].isna()) & (df['DAY'] == workday) & (df['JOB_TITLE'] == title)), True, np.nan))

但是结果DF替换了前一个循环的结果--尽管进行了isna()测试。

63lcw9qa

63lcw9qa1#

您可以执行以下操作:

import pandas as pd

shifts = [("Cashier", "Thursday"), ("Cashier", "Thursday"),
        ("Cashier", "Thursday"), ("Cook", "Thursday"),
        ("Cashier", "Friday"), ("Cashier", "Friday"),
        ("Cook", "Friday"), ("Cook", "Friday"),
        ("Cashier", "Saturday"), ("Cook", "Saturday"),
        ("Cook", "Saturday")]
labels = ["JOB_TITLE", "DAY"]
df = pd.DataFrame.from_records(shifts, columns=labels)

shifts_series = df.groupby('DAY')['JOB_TITLE'].value_counts()
shifts_series = shifts_series.reset_index(name='TYPE')

df = pd.merge(df, shifts_series, on=['JOB_TITLE', 'DAY'])
print(df)

其给出:

JOB_TITLE       DAY  TYPE
0    Cashier  Thursday     3
1    Cashier  Thursday     3
2    Cashier  Thursday     3
3       Cook  Thursday     1
4    Cashier    Friday     2
5    Cashier    Friday     2
6       Cook    Friday     2
7       Cook    Friday     2
8    Cashier  Saturday     1
9       Cook  Saturday     2
10      Cook  Saturday     2

相关问题