如何在pandas df中删除具有连续数字的行

m528fe3b  于 2023-04-04  发布在  其他
关注(0)|答案(1)|浏览(117)

我试图从一个 Dataframe 中删除包含连续数字的行,如1234或6789…

# Define regular expression pattern for sequential digits

pattern = r'\{3,}'
mask = df1['prix'].str.contains(pattern)
df_filtered = df1[mask]

print(df_filtered)

我试过这段代码,但它返回相同的数据,当我尝试这一个之前

pattern = r'\d{3,}'
df1['prix'] = df1['prix'].astype(str) 
mask = df1['prix'].str.contains(pattern)
if mask.any(): 
    print("There are rows with sequential digits.")
else:
    print("There are no rows with sequential digits.")

它返回数据具有连续的数字

h7appiyu

h7appiyu1#

您可以使用以下正则表达式:

pattern = '|'.join([f'{i}{i+1}{i+2}' for i in range(0, 8, 1)]
                 + [f'{i}{i-1}{i-2}' for i in range(9, 1, -1)])
mask = df['prix'].astype(str).str.contains(pattern)

out = df[~mask]

输出:

>>> out
   prix
3  1379

>>> pattern
'012|123|234|345|456|567|678|789|987|876|765|654|543|432|321|210'

>>> mask
0     True
1     True
2     True
3    False
Name: prix, dtype: bool

>>> df
   prix
0  1234  # drop (123)
1  6789  # drop (678)
2  4563  # drop (456)
3  1379  # keep

相关问题