我有一个 Dataframe 如下:
ID GROUP DATE_MIN DATE_MAX
1 L1 02/12/2022 6:30AM 02/12/2022 6:35AM
2 L1 02/12/2022 6:33AM 02/12/2022 6:40AM
3 L1 02/12/2022 6:37AM 02/12/2022 6:40AM
4 L2 02/12/2022 7:30AM 02/12/2022 7:35AM
5 L2 02/12/2022 7:33AM 02/12/2022 7:35AM
6 L2 02/12/2022 7:34AM 02/12/2022 7:38AM
我想计算时间范围(DATE_MIN、DATE_MAX)之间每组(GROUP列)的行数,预期输出为
ID GROUP DATE_MIN DATE_MAX NumberOfRows
1 L1 02/12/2022 6:30AM 02/12/2022 6:35AM 2 <<because of ID 1 and 2>>
2 L1 02/12/2022 6:33AM 02/12/2022 6:40AM 3 <<because of ID 1, 2 and 3>>
3 L1 02/12/2022 6:37AM 02/12/2022 6:40AM 2 <<because of ID 3 and 2>>
4 L2 02/12/2022 7:30AM 02/12/2022 7:35AM 1 << because of 4 only>>
5 L2 02/12/2022 7:36AM 02/12/2022 7:40AM 2 <<because of 5 and 6>>
6 L2 02/12/2022 7:37AM 02/12/2022 7:40AM 2 <<because of 5 and 6>>
2条答案
按热度按时间ao218c7q1#
假设您的组不是很大(因为此解决方案为每个组构建一个方阵),您可以使用numpy:
输出:
bpzcxfmw2#
df [“行数”] = df.分组依据(“分组”)[“日期_分钟”].转换(λ x:x.介于(df [“日期最小值”],df [“日期最大值”]).sum())打印(df)