pandas 如何删除两个分隔符之间的所有行

k3fezbri  于 11个月前  发布在  其他
关注(0)|答案(6)|浏览(73)

请在下面找到我的输入/输出:

输入:

Col1
Green
Purple
Start delimiter
abchd
oeitms
End delimiter
Yellow
Start delimiter
kdkfkd
dldldldl
mdmdmdm
End delimiter
Red
Brown
Rose
White

字符串

输出(需要):

Col1
Green
Purple
Yellow
Red
Brown
Rose
White


基本上,我试图删除Start delimiterEnd delimiter之间的所有行(包括它们的行)。问题是这两者之间的距离是不固定的!
我尝试了以下代码,但没有成功:

m1 = df['Col1'] == 'Start delimiter'
m2 = df['Col1'] == 'End delimiter'

df.loc[~m1&~m2]


请问您有什么建议吗?

yeotifhr

yeotifhr1#

对于矢量方法,您可以使用:用途:

d = {'Start delimiter': False,
     'End delimiter': True}

# assign False after Start
# True after End
# True otherwise
m1 = df['Col1'].map(d).ffill().fillna(True)
# ensure not an End
m2 = df['Col1'].ne('End delimiter')

# if both conditions are met, keep the rows
out = df.loc[m1&m2]

字符串
输出量:

Col1
0    Green
1   Purple
6   Yellow
12     Red
13   Brown
14    Rose
15   White

irtuqstp

irtuqstp2#

尝试:

df
    Col1
0   Green
1   Purple
2   Start delimiter
3   abchd
4   oeitms
5   End delimiter
6   Yellow
7   Start delimiter
8   kdkfkd
9   dldldldl
10  mdmdmdm
11  End delimiter
12  Red
13  Brown
14  Rose
15  White

a = False
b = []
for i in df['Col1'].tolist():
    print(i)
    if i == 'Start delimiter':
        a = True
        b.append(i)
    elif i == 'End delimiter':
        a = False 
        b.append(i)
    elif a:
        b.append(i)

#b --> ['Start delimiter', 'abchd', 'oeitms', 'End delimiter', 'Start delimiter', 'kdkfkd', 'dldldldl', 'mdmdmdm', 'End delimiter']
df = df[~df['Col1'].isin(b)]

df
    Col1
0   Green
1   Purple
6   Yellow
12  Red
13  Brown
14  Rose
15  White

字符串

yrefmtwq

yrefmtwq3#

你可以试试:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {"col1":["red","blue","Start delimiter","black", "white", "End delimiter", "green", "blue",
             "Start delimiter", "red", "green", "anothercolor", "End delimiter", "again_anothercolor"]}
)

idx_start = df.index[df.col1 == "Start delimiter"]  
idx_end = df.index[df.col1 == "End delimiter"]

rows2drop = []

for start, end in zip(idx_start, idx_end):
    rows2drop.extend(np.arange(start, end+1))
    
df.drop(rows2drop, axis=0)

字符串

首字母缩写

+----+--------------------+
|    | col1               |
|----+--------------------|
|  0 | red                |
|  1 | blue               |
|  2 | Start delimiter    |
|  3 | black              |
|  4 | white              |
|  5 | End delimiter      |
|  6 | green              |
|  7 | blue               |
|  8 | Start delimiter    |
|  9 | red                |
| 10 | green              |
| 11 | anothercolor       |
| 12 | End delimiter      |
| 13 | again_anothercolor |
+----+--------------------+

Dataframe after drop()

+----+--------------------+
|    | col1               |
|----+--------------------|
|  0 | red                |
|  1 | blue               |
|  6 | green              |
|  7 | blue               |
| 13 | again_anothercolor |
+----+--------------------+

kyks70gy

kyks70gy4#

这里有一个函数可以做到这一点。

def cleanlist(list_to_clean,start_clean,end_clean) :
    clean_list =[]
    clean_status = False
    for x in list_to_clean:
        if x == start_clean :
            clean_status=True
            
        elif clean_status == False:
            clean_list.append(x)
            
        elif  x == end_clean :
            clean_status = False
            
    return clean_list

print(cleanlist(list,'Start delimiter','End delimiter'))

字符串

wvyml7n5

wvyml7n55#

一个选项是构建不在开始和结束之内的索引列表,并使用该索引df。这假设对于每个开始索引,都有一个对应的结束索引:

start = df.Col1.eq('Start delimiter').to_numpy().nonzero()[0]
end = df.Col1.eq('End delimiter').to_numpy().nonzero()[0] + 1
combo = [np.arange(s, e) for s, e in zip(start, end)]
combo = np.concatenate(combo)
keep = df.index.difference(combo)
df.loc[keep]

      Col1
0    Green
1   Purple
6   Yellow
12     Red
13   Brown
14    Rose
15   White

字符串

mwg9r5ms

mwg9r5ms6#

也可以使用mod()

s = 'Start delimiter'
e = 'End delimiter'

df.loc[~df['Col1'].isin([s,e]).cumsum().mod(2) & df['Col1'].ne(e)]

字符串

s = df['Col1'].eq('Start delimiter')
e = df['Col1'].eq('End delimiter')

df.loc[~s.where(s|e).ffill().fillna(False) & ~e]


s = df['Col1'].eq('Start delimiter')
e = df['Col1'].eq('End delimiter')

df.loc[~e.iloc[::-1].groupby(s.cumsum()).cummax()]


输出量:

Col1
0    Green
1   Purple
6   Yellow
12     Red
13   Brown
14    Rose
15   White

相关问题