pandas Python从dataframe中根据条件找到多个集合,并从每个集合中读取第一个和第二行

91zkwejq  于 2023-06-28  发布在  Python
关注(0)|答案(2)|浏览(106)

我有一个csv文件(2不同的样本)-
样品-1-

Date-Time              Func    Byte
2023/01/01 16:45:13       APP      0 
2023/01/01 16:57:08       APP      1  
2023/01/01 17:10:44       APP      5 
2023/01/01 17:11:04       APP      0  
2023/01/01 17:12:24       APP      0 
2023/01/01 17:15:27       APP      4
2023/01/01 17:16:08       APP      4

样品-2-

Date-Time              Func    Byte
2023/01/01 16:45:13       APP      4 
2023/01/01 16:57:08       APP      1  
2023/01/01 17:10:44       APP      5 
2023/01/01 17:11:04       APP      0  
2023/01/01 17:12:24       APP      0 
2023/01/01 17:15:27       APP      4
2023/01/01 17:16:08       APP      4 
2023/01/01 17:16:20       APP      0

我想获取start_time(当字节值开始为非零时)和end time(当字节值结束为非零时),如果结束时间是文件的最后一行,也将其保留为空-
样品1的输出-

Date/Time(Start)         Func         Date/Time(Start)
2023/01/01 16:57:08      APP         2023/01/01 17:10:44
2023/01/01 17:15:27      APP

样品2的输出-

Date/Time(Start)         Func         Date/Time(Start)
2023/01/01 16:45:13      APP         2023/01/01 17:10:44
2023/01/01 17:15:27      APP         2023/01/01 17:16:08
gblwokeq

gblwokeq1#

首先过滤掉boolean indexing中的非0值,并通过Series.cumsum创建组的累积和连续的非0值,因此可能通过GroupBy.aggfirstlast datetime值和用于测试最后一个值的索引列进行聚合:

#create default index if necessary
#df = df.reset_index(drop=True)

m = df['Byte'].eq(0)

out = (df.reset_index()[~m]
        .groupby([m.cumsum(), 'Func'])
        .agg(**{'Date/Time(Start)':('Date-Time', 'first'),
                'Date/Time(End)':('Date-Time', 'last'),
                'last':('index','last')})
        .droplevel(0)
        .reset_index())

out.loc[out.pop('last').eq(df.index[-1]), 'Date/Time(End)'] = ''
print (out)
  Func     Date/Time(Start)       Date/Time(End)
0  APP  2023/01/01 16:57:08  2023/01/01 17:10:44
1  APP  2023/01/01 17:15:27                     

print (out)
              Date/Time(Start)       Date/Time(End)
Byte Func                                          
0    APP   2023/01/01 16:45:13  2023/01/01 17:10:44
2    APP   2023/01/01 17:15:27  2023/01/01 17:16:08

如果需要按Func色谱柱处理每组溶液:

print (df)
              Date-Time  Func  Byte
0   2023/01/01 16:45:13   APP     0
1   2023/01/01 16:57:08   APP     1
2   2023/01/01 17:10:44   APP     5
3   2023/01/01 17:11:04   APP     0
4   2023/01/01 17:12:24   APP     0
5   2023/01/01 17:15:27   APP     4
6   2023/01/01 17:16:08   APP     4
7   2023/01/01 16:45:13  APP1     4
8   2023/01/01 16:57:08  APP1     1
9   2023/01/01 17:10:44  APP1     5
10  2023/01/01 17:11:04  APP1     0
11  2023/01/01 17:12:24  APP1     0
12  2023/01/01 17:15:27  APP1     4
13  2023/01/01 17:16:08  APP1     4
14  2023/01/01 17:16:20  APP1     0
m = df['Byte'].eq(0)

df1 = df.reset_index()
out = (df1[~m]
        .groupby([m.cumsum(), 'Func'])
        .agg(**{'Date/Time(Start)':('Date-Time', 'first'),
                'Date/Time(End)':('Date-Time', 'last'),
                'last':('index','last')})
        .droplevel(0)
        .reset_index())

last = df1.groupby('Func')['index'].last()
out.loc[out.pop('last').isin(last), 'Date/Time(End)'] = ''
print (out)
   Func     Date/Time(Start)       Date/Time(End)
0   APP  2023/01/01 16:57:08  2023/01/01 17:10:44
1   APP  2023/01/01 17:15:27                     
2  APP1  2023/01/01 16:45:13  2023/01/01 17:10:44
3  APP1  2023/01/01 17:15:27  2023/01/01 17:16:08
7lrncoxx

7lrncoxx2#

您可以将maskduplicatedbfill一起使用:

# is Byte 0?
m = df['Byte'].eq(0)
# form groups of the values following a 0
group = m.cumsum()
# identify first non-0 of each group
m2 = ~group.mask(m).duplicated(keep='first').mask(m, True)

# assign last value and filter
out = df[m2].assign(**{'Date/Time(End)':
                       df['Date-Time']
                       .mask(group.duplicated(keep='last')
                             |(df.index==df.index[-1]) # ignore last row
                            ).bfill()
                      })

输出:

# first example
            Date-Time Func  Byte      Date/Time(End)
1 2023-01-01 16:57:08  APP     1 2023-01-01 17:10:44
5 2023-01-01 17:15:27  APP     4                 NaT

# second example
            Date-Time Func  Byte      Date/Time(End)
0 2023-01-01 16:45:13  APP     4 2023-01-01 17:10:44
5 2023-01-01 17:15:27  APP     4 2023-01-01 17:16:08

中间体:

  • 具有*的行是输出中的最后一行。*
Date-Time Func  Byte      m  group     m2               bfill
0 2023-01-01 16:45:13  APP     0   True      1  False 2023-01-01 17:10:44
1 2023-01-01 16:57:08  APP     1  False      1   True 2023-01-01 17:10:44 *
2 2023-01-01 17:10:44  APP     5  False      1  False 2023-01-01 17:10:44
3 2023-01-01 17:11:04  APP     0   True      2  False 2023-01-01 17:11:04
4 2023-01-01 17:12:24  APP     0   True      3  False                 NaT
5 2023-01-01 17:15:27  APP     4  False      3   True                 NaT *
6 2023-01-01 17:16:08  APP     4  False      3  False                 NaT

相关问题