pandas 如何从df中删除连续数字

eqoofvh9 于 2023-04-10 发布在其他

关注(0)|答案(2)|浏览(115)

我想删除在df中包含连续数字的行，并且在末尾包含0，例如12340或45670

rows_to_remove = []
for i, row in df.iterrows():
    digits = [int(d) for d in str(row['prix'])]
    if all([digits[j+1]-digits[j] == 1 for j in range(len(digits)-1)]) and digits[-1] == 0:
        rows_to_remove.append(i)

df = df.drop(rows_to_remove)

pandas

来源：https://stackoverflow.com/questions/75928974/how-to-remove-consecutive-digits-from-a-df

2条答案

按热度按时间

agyaoht71#

你可以使用列表解析：

consecutive = '123456789'

m = [not (s.endswith('0') and s.rstrip('0') in consecutive)
     for s in df['prix'].astype(str)]

out = df[m]

输出：

prix
1  12378
2  12345

这是如何工作的：

consecutive = '123456789'

df['keep'] = [not(s.endswith('0') and s.rstrip('0') in consecutive)
              for s in df['prix'].astype(str)]

print(df)

输出：

prix   keep
0  123450  False
1   12378   True
2   12345   True
3   45670  False

可重现输入：

df = pd.DataFrame({'prix': [123450, 12378, 12345, 45670]})

2位数字

如果您想保留2位数，例如20：

consecutive = '123456789'

m1 = np.array([not(s.endswith('0') and s.rstrip('0') in consecutive)
               for s in df['prix'].astype(str)])
m2 = df['prix'].lt(100)

out = df[m1|m2]

或者：

m = [not (s.endswith('0') and len(s) > 2 and s.rstrip('0') in consecutive)
     for s in df['prix'].astype(str)]

out = df[m]

输出：

使用的输入：

df = pd.DataFrame({'prix': [123450, 12378, 12345, 45670, 20]})

赞(0）回复(0）举报 2023-04-10

sg3maiej2#

如果我理解正确的话，你想过滤掉以'0'结尾的字符串化int（int是10的倍数），当去掉尾随的0时，长度为2+，是'123456789'的子字符串？
如果是这样的话，我相信这将是可行的：

from pandas import DataFrame, Series

# Some test data
df = DataFrame({
    'i': [
        1,
        10,
        11,
        12,
        120, # Dropped
        122,
        123,
        1230, # Dropped
        12300, # Dropped
        1240,
        13,
        130,
        134,
        1340,
        2,
        20,
        21,
        210,
        2120,
        23,
        230, # Dropped
        2340, # Dropped
        234000, # Dropped
        2350,
        123456789,
        1234567890 # Dropped
    ]})

filt = Series(s.endswith('0') and len(s.rstrip('0')) > 1 and s.rstrip('0') in '123456789' for s in df['i'].astype(str))

filtered_df = df.loc[~filt]

您可以将逻辑拆分以使其更具可读性，并将字符串过滤器与&运算符一起使用：

stringified = df['i'].astype(str)

filt_1 = Series(s.endswith('0') for s in stringified)
filt_2 = Series(len(s.rstrip('0')) > 1 for s in stringified)
filt_3 = Series(s.rstrip('0') in '123456789' for s in stringified)

filtered_df = df.loc[~(filt_1 & filt_2 & filt_3)]

（也可能有方法使过滤更有效）

赞(0）回复(0）举报 2023-04-10

我来回答

pandas 如何从df中删除连续数字

2条答案

2位数字

相关问题

热门标签

最新问答