pandas 如何判断df中的值具有7位及以上数字的递归序列

9wbgstp7  于 2022-12-16  发布在  其他
关注(0)|答案(2)|浏览(124)

我有 Dataframe :

df1 = pd.DataFrame({'number': ['1111112357896', '45226212354444', '150000000064', '5485329999999', '4589622567431']})

问题:查找值具有7及以上数字的循环序列的值
| 数|重复|
| - ------|- ------|
| 小行星1111112357896|无|
| 小行星4522612354444|无|
| 小行星150000000064| 1个|
| 小行星548532| 1个|
| 小行星45896|无|

8ulbf1ek

8ulbf1ek1#

使用带有str.contains的正则表达式:

df1['repeat'] = df1['number'].str.contains(r'(\d)\1{6}').astype(int)

正则表达式:

(\d)     # match and capture a digit
\1{6}    # match the captured digit 6 more times

输出:

number  repeat
0   1111112357896       0
1  45226212354444       0
2    150000000064       1
3   5485329999999       1
4   4589622567431       0
46qrfjad

46qrfjad2#

这里有一个方法:

def find_repeats(numbers, cutoff=7):
    repeated_numbers = []
    curr_n = None
    count = 0
    for n in str(numbers):
        if n == curr_n:
            count += 1
            continue
            
        if count >= cutoff:
            repeated_numbers.append(curr_n)
        curr_n = n
        count = 1

    # check the end of the string as well
    if count >= cutoff:
        repeated_numbers.append(curr_n)
        
    return len(repeated_numbers)

df1 = pd.DataFrame({'number': ['1111112357896', '45226212354444', '150000000064', '5485329999999', '4589622567431']})
df1['repeat'] = df1.number.apply(find_repeats)

相关问题