pandas 如何从df中删除连续数字

eqoofvh9  于 2023-04-10  发布在  其他
关注(0)|答案(2)|浏览(116)

我想删除在df中包含连续数字的行,并且在末尾包含0,例如12340或45670

rows_to_remove = []
for i, row in df.iterrows():
    digits = [int(d) for d in str(row['prix'])]
    if all([digits[j+1]-digits[j] == 1 for j in range(len(digits)-1)]) and digits[-1] == 0:
        rows_to_remove.append(i)

df = df.drop(rows_to_remove)
agyaoht7

agyaoht71#

你可以使用列表解析:

consecutive = '123456789'

m = [not (s.endswith('0') and s.rstrip('0') in consecutive)
     for s in df['prix'].astype(str)]

out = df[m]

输出:

prix
1  12378
2  12345

这是如何工作的:

consecutive = '123456789'

df['keep'] = [not(s.endswith('0') and s.rstrip('0') in consecutive)
              for s in df['prix'].astype(str)]

print(df)

输出:

prix   keep
0  123450  False
1   12378   True
2   12345   True
3   45670  False

可重现输入:

df = pd.DataFrame({'prix': [123450, 12378, 12345, 45670]})
2位数字

如果您想保留2位数,例如20

consecutive = '123456789'

m1 = np.array([not(s.endswith('0') and s.rstrip('0') in consecutive)
               for s in df['prix'].astype(str)])
m2 = df['prix'].lt(100)

out = df[m1|m2]

或者:

m = [not (s.endswith('0') and len(s) > 2 and s.rstrip('0') in consecutive)
     for s in df['prix'].astype(str)]

out = df[m]

输出:

prix
1  12378
2  12345
4     20

使用的输入:

df = pd.DataFrame({'prix': [123450, 12378, 12345, 45670, 20]})
sg3maiej

sg3maiej2#

如果我理解正确的话,你想过滤掉以'0'结尾的字符串化int(int是10的倍数),当去掉尾随的0时,长度为2+,是'123456789'的子字符串?
如果是这样的话,我相信这将是可行的:

from pandas import DataFrame, Series

# Some test data
df = DataFrame({
    'i': [
        1,
        10,
        11,
        12,
        120, # Dropped
        122,
        123,
        1230, # Dropped
        12300, # Dropped
        1240,
        13,
        130,
        134,
        1340,
        2,
        20,
        21,
        210,
        2120,
        23,
        230, # Dropped
        2340, # Dropped
        234000, # Dropped
        2350,
        123456789,
        1234567890 # Dropped
    ]})

filt = Series(s.endswith('0') and len(s.rstrip('0')) > 1 and s.rstrip('0') in '123456789' for s in df['i'].astype(str))

filtered_df = df.loc[~filt]

您可以将逻辑拆分以使其更具可读性,并将字符串过滤器与&运算符一起使用:

stringified = df['i'].astype(str)

filt_1 = Series(s.endswith('0') for s in stringified)
filt_2 = Series(len(s.rstrip('0')) > 1 for s in stringified)
filt_3 = Series(s.rstrip('0') in '123456789' for s in stringified)

filtered_df = df.loc[~(filt_1 & filt_2 & filt_3)]

(也可能有方法使过滤更有效)

相关问题