在一些条件下,将行与前一行进行比较,并在python pandas中删除具有条件行

tzcvj98z  于 2023-10-14  发布在  Python
关注(0)|答案(4)|浏览(125)

我有一个概念,我需要做什么,但我不能写正确的代码运行,请看一看,给予一些建议。
步骤1.查找第二列中包含值的行
步骤2.对于这些行,将第一列中的值与它们的前一行进行比较
步骤3.删除第一列值较大的行

|missing | diff |
|--------|------|
| 0      | nan  |
| 1      | 60   |
| 1      | nan  |
| 0      | nan  |
| 0      | nan  |
| 1      | 180  |
| 1      | nan  |
| 0      | 120  |

例如,我想将缺失的值与diff [120,180,60]中的行值及其之前的行进行比较。最后,欲望的框架看起来就像

|missing | diff |
|--------|------|
| 0      | nan  |
| 1      | nan  |
| 0      | nan  |
| 0      | nan  |
| 0      | 120  |

根据答案更新问题,得到与原始DF相同的DF

import pandas as pd
import numpy as np
data={'missing':[0,1,1,0,0,1,1,0],'diff':[np.nan,60,np.nan,np.nan,np.nan,180,np.nan,120]}
df=pd.DataFrame(data)
df
missing diff
0   0   NaN
1   1   60.0
2   1   NaN
3   0   NaN
4   0   NaN
5   1   180.0
6   1   NaN
7   0   120.0
if df['diff'][ind]!=np.nan:
    if ind!=0:
        if df['missing'][ind]>df['missing'][ind-1]:
            df=df.drop(ind,0)
        else:
            df=df.drop(ind-1,0)
df
missing diff
0   0   NaN
1   1   60.0
2   1   NaN
3   0   NaN
4   0   NaN
5   1   180.0
6   1   NaN
7   0   120.0
fnatzsnv

fnatzsnv1#

IIUC,您可以尝试:

m = df['diff'].notna()
df = (
    pd.concat([
        df[df['diff'].isna()],
        df[m][df[m.shift(-1).fillna(False)]['missing'].values >
              df[m]['missing'].values]
    ])
)

输出值:

missing  diff
1       0  <NA>
3       1  <NA>
4       0  <NA>
5       0  <NA>
7       1  <NA>
8       0   120
qcuzuvrc

qcuzuvrc2#

这将是肯定的工作:

for ind in df.index:
    if np.isnan(df['diff'][ind])==False:
        if ind!=0:
            if df['missing'][ind]>df['missing'][ind-1]:
                df=df.drop(ind,0)
            else:
                df=df.drop(ind-1,0)
5w9g7ksd

5w9g7ksd3#

这将工作
对于df中的ind.index:

if df['diff'][ind]!="nan":
    if ind!=0:
        if df['missing'][ind]>df['missing'][ind-1]:
            df=df.drop(ind,0)
        else:
            df=df.drop(ind-1,0)
6uxekuva

6uxekuva4#

import pandas as pd #import pandas

define字典

data='missing':[0,1,1,0,0,1,1,0],'diff ':[nan,60,nan,nan,180,nan,120]}

dictionary to语法

df=pd.DataFrame(data)
打印(df)

对于嵌套框中的每行

对于df中的ind.index:

if df['diff'][ind]!="nan":
  if ind!=0:

        #only each row whose diff value is a number

        #find the rows that contains values in the second column and compare it with previous value

    if df['missing'][ind]>df['missing'][ind-1]:

                #drop the rows with larger first column value

        df=df.drop(ind,0)

    else:

        df=df.drop(ind-1,0)

打印(df)

相关问题