pandas 在Python中基于连续日期合并记录

83qze16e  于 2023-08-01  发布在  Python
关注(0)|答案(1)|浏览(141)

如果日期相同,我想合并我的dataframe的记录。在下面的例子中,我想合并日期(13,14,15),(25,26),(30,31)在一起,因为有连续的日期。我想打破合并记录如果有任何单日休息。

cust               date         description
   CUST123              2020-06-13   observed increased loss rate
   CUST123              2020-06-13   cut performed job
   CUST123              2020-06-14   working tight area
   CUST123              2020-06-15   production shut neighbouring app
   CUST123              2020-07-17   loss pressure slow gain trip
   CUST123              2020-08-25   established circulation load
   CUST123              2020-08-26   performed sticky test
   CUST123              2020-08-28   job meeting prior low energy
   CUST123              2020-08-30   performed maintenance service
   CUST123              2020-08-31   reconnected control line

字符串
期望产量

cust               date         description
   CUST123              2020-06-13   observed increased loss rate cut performed job 
                                      working tight area production shut neighbouring app
   CUST123              2020-07-17   loss pressure slow gain trip
   CUST123              2020-08-25   established circulation load performed sticky test
   CUST123              2020-08-28   job meeting prior low energy
   CUST123              2020-08-30   performed maintenance service reconnected control line

b4wnujal

b4wnujal1#

如果日期相同,为了合并数据框架的记录,您可以执行以下操作:

merged_df = df.groupby(['cust', 'date'])['description'].apply(' '.join).reset_index()

字符串
其输出:

cust       date                                     description
0  CUST123 2020-06-13  observed increased loss rate cut performed job
1  CUST123 2020-06-14                              working tight area
2  CUST123 2020-06-15                production shut neighbouring app
3  CUST123 2020-07-17                    loss pressure slow gain trip
4  CUST123 2020-08-25                    established circulation load
5  CUST123 2020-08-26                           performed sticky test
6  CUST123 2020-08-28                    job meeting prior low energy
7  CUST123 2020-08-30                   performed maintenance service
8  CUST123 2020-08-31                        reconnected control line


编辑:如果你想合并连续日期,保留连续范围的第一个日期,你可以这样做:

# Sort DataFrame by 'date' (in case 'df' is not already sorted)
df.sort_values('date', inplace=True)

# Initialize variables
merged_data = []
prev_row = None

# Loop through the rows
for _, row in df.iterrows():
    if prev_row is None or row['cust'] != prev_row['cust'] or (row['date'] - prev_row['date']).days > 1:
        merged_data.append({'cust': row['cust'], 'date': row['date'], 'description': row['description']})
    else:
        merged_data[-1]['description'] += ' ' + row['description']
    prev_row = row

# Create merged DataFrame
merged_df = pd.DataFrame(merged_data)

print(merged_df)


输出量:

0  CUST123 2020-06-13  observed increased loss rate cut performed job...
1  CUST123 2020-07-17                       loss pressure slow gain trip
2  CUST123 2020-08-25  established circulation load performed sticky ...
3  CUST123 2020-08-28                       job meeting prior low energy
4  CUST123 2020-08-30  performed maintenance service reconnected cont...

相关问题