Pandas Dataframe 中的孤立正数序列

webghufk  于 2023-01-04  发布在  其他
关注(0)|答案(5)|浏览(110)

我想指出Pandas Dataframe 中储存的数据的“周期”。
假设我有这些值:

values
1    0
2    8
3    1
4    0
5    5
6    6
7    4
8    7
9    0
10   2
11   9
12   1
13   0

我想找出长度上级或等于3的严格正数序列。每个非严格正数都将结束一个正在进行的序列。
这将给予:

values  period
1    0      None
2    8      None
3    1      None
4    0      None
5    5       1
6    6       1
7    4       1
8    7       1
9    0      None
10   2       2
11   9       2
12   1       2
13   0      None
8oomwypt

8oomwypt1#

使用布尔算术:

N = 3
m1 = df['values'].le(0)
m2 = df.groupby(m1.cumsum())['values'].transform('count').gt(N)
df['period'] = (m1&m2).cumsum().where((~m1)&m2)

输出:

values  period
1        0     NaN
2        8     NaN
3        1     NaN
4        0     NaN
5        5     1.0
6        6     1.0
7        4     1.0
8        7     1.0
9        0     NaN
10       2     2.0
11       9     2.0
12       1     2.0
13       0     NaN

中间体:

values     m1     m2  CS(m1)  m1&m2  CS(m1&m2)  (~m1)&m2  period
1        0   True  False       1  False          0     False     NaN
2        8  False  False       1  False          0     False     NaN
3        1  False  False       1  False          0     False     NaN
4        0   True   True       2   True          1     False     NaN
5        5  False   True       2  False          1      True     1.0
6        6  False   True       2  False          1      True     1.0
7        4  False   True       2  False          1      True     1.0
8        7  False   True       2  False          1      True     1.0
9        0   True   True       3   True          2     False     NaN
10       2  False   True       3  False          2      True     2.0
11       9  False   True       3  False          2      True     2.0
12       1  False   True       3  False          2      True     2.0
13       0   True  False       4  False          2     False     NaN
w9apscun

w9apscun2#

你可以试试

sign = np.sign(df['values'])
m = sign.ne(sign.shift()).cumsum()  # continuous same value group

df['period'] = (df[sign.eq(1)]      # Exclude non-positive numbers
                .groupby(m)
                ['values'].filter(lambda col: len(col) >= 3)
                .groupby(m)
                .ngroup() + 1
                )
print(df)

    values  period
1        0     NaN
2        8     NaN
3        1     NaN
4        0     NaN
5        5     1.0
6        6     1.0
7        4     1.0
8        7     1.0
9        0     NaN
10       2     2.0
11       9     2.0
12       1     2.0
13       0     NaN
b4lqfgs4

b4lqfgs43#

一个简单的解决方案:

count = 0
n_groups = 0
seq_idx = [None]*len(df)

for i in range(len(df)):
    
    if df.iloc[i]['values'] > 0:
        count += 1
    else:
        if count >= 3:
            n_groups += 1
            seq_idx[i-count: i] = [n_groups]*count
            
        count = 0
df['period'] = seq_idx

输出:

values  period
0   0   NaN
1   8   NaN
2   1   NaN
3   0   NaN
4   5   1.0
5   6   1.0
6   4   1.0
7   7   1.0
8   0   NaN
9   2   2.0
10  9   2.0
11  1   2.0
12  0   NaN
6za6bjd0

6za6bjd04#

使用find_peaks找到大小至少为3的 * plateau *(连续正整数)的一种简单方法是:

import numpy as np
import pandas as pd

from scipy.signal import find_peaks

df = pd.DataFrame.from_dict({'values': {0: 0, 1: 8, 2: 1, 3: 0, 4: 5, 5: 6, 6: 4, 7: 7, 8: 0, 9: 2, 10: 9, 11: 1, 12: 0}})

_, plateaus = find_peaks((df["values"] > 0).to_numpy(), plateau_size=3)
indices = np.arange(len(df["values"]))[:, None]
indices = (indices >= plateaus["left_edges"]) & (indices <= plateaus["right_edges"])
res = (indices * (np.arange(indices.shape[1]) + 1)).sum(axis=1)
df["periods"] = res

print(df)
    • 产出**
values  periods
0        0        0
1        8        0
2        1        0
3        0        0
4        5        1
5        6        1
6        4        1
7        7        1
8        0        0
9        2        2
10       9        2
11       1        2
12       0        0
zynd9foi

zynd9foi5#

def function1(dd:pd.DataFrame):
    dd.loc[:,'period']=None
    if len(dd)>=4:
        dd.iloc[1:,2]=dd.iloc[1:,1]
    return dd

df1.assign(col1=df1.le(0).cumsum().sub(1)).groupby('col1').apply(function1)

输出:

values  col1 period
0        0     0   None
1        8     0   None
2        1     0   None
3        0     1   None
4        5     1      1
5        6     1      1
6        4     1      1
7        7     1      1
8        0     2   None
9        2     2      2
10       9     2      2
11       1     2      2
12       0     3   None

相关问题