Pandas:使用Groupby过滤具有开始/结束时间的多个列值分组,并计算具有连续日期范围的组数

qmelpv7a  于 2023-08-01  发布在  其他
关注(0)|答案(1)|浏览(110)

与这一个斗争…
我有一个大的数据框,其中包含三列分组(A、B、C)和两列日期时间范围(StartTime、EndTime),最后一列名为Blocks的值。
我需要将三个元素组合在一起:A,B,C ->和计数“循环”的数量,其中下一行的StartTime在当前行的StartTime的1天内。
->所以在分组A-B-C之后;如果我们有两个块,一个在7/19/23结束,另一个在7/20/23开始,那么我们在输出中将其计数为1个“周期”(我试图过滤掉匹配项,以便稍后我可以只计算值)
尝试:

df.sort_values(by=['A', 'B', 'C', 'StartTime'], inplace=True)

    df['DaysBetween'] = ( df['EndTime'] - df['StartTime'] ).dt.days

    mask =  ( 
            (df['EndTime'] != df['StartTime'].shift(-1) + pd.Timedelta(days=1)) & \
            (df['A'] == df['A'].shift(-1)) & \
            (df['B'] == df['B'].shift(-1)) & \
            (df['C'] == df['C'].shift(-1))
            )

    filter_df = sb[mask] # Filtering out the matching Blocks to later count the number of cycles
    filter_df.reset_index(drop=True, inplace=True)
    cycle_count = filter_df.groupby(['A', 'B', 'C'])['Blocks'].nunique().reset_index(name="CountCycles")

字符串
这种方法是可行的,但是我丢失了只有一个块的行,并且不知道如何保留它们。

dwbf0jvd

dwbf0jvd1#

import pandas as pd

# Sample DataFrame
data = {
    'Date': ['2010-01-01', '2010-02-01', '2010-03-01', '2010-04-01', '2010-05-01', '2010-06-01'],
    'HDD': [3000, 2500, 2000, 1500, 1000, 500],
    'CDD': [0, 500, 1000, 1500, 2000, 2500]
}

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])

# Extract Year and Month columns
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month_name()

# Pivot the DataFrame to get the desired format
pivot_df = df.pivot_table(index='Year', columns='Month', values=['HDD', 'CDD']).reset_index()

# Flatten the column names
pivot_df.columns = [' '.join(col).strip() for col in pivot_df.columns.values]

# Optionally, you can reset the index and rename the columns
pivot_df.reset_index(drop=True, inplace=True)
pivot_df.rename(columns={'Year ': 'Year'}, inplace=True)

print(pivot_df)

字符串
输出量:

Year  CDD April  CDD February  CDD January  CDD June  CDD March  CDD May  \
0  2010       1500           500            0      2500       1000     2000   

   HDD April  HDD February  HDD January  HDD June  HDD March  HDD May  
0       1500          2500         3000       500       2000     1000

相关问题