如何在numpy字符串数组中查找子字符串的所有匹配项

jchrr9hc 于 2022-12-18 发布在其他

关注(0)|答案(3)|浏览(126)

我试图在一个numpy字符串数组中找到一个子字符串的所有示例。

myArray = np.array(['Time', 'utc_sec', 'UTC_day', 'Utc_Hour'])
sub = 'utc'

它应该不区分大小写，因此该方法应该返回[1，2，3]。

numpy

来源：https://stackoverflow.com/questions/74783571/how-to-find-all-occurences-of-a-substring-in-a-numpy-string-array

3条答案

按热度按时间

k4ymrczo1#

使用np.char.lower和np.char.find的 * 矢量化 * 方法

import numpy as np
myArray = np.array(['Time', 'utc_sec', 'UTC_day', 'Utc_Hour'])
res = np.where(np.char.find(np.char.lower(myArray), 'utc') > -1)[0]
print(res)

产出

[1 2 3]

其思想是使用np.char.lower使np.char.find * 不区分大小写 *，然后使用np.where获取包含子字符串的索引。

赞(0）回复(0）举报 2022-12-18

r55awzrz2#

您可以使用if sub in string来检查它。

import numpy as np

myArray = np.array(['Time', 'utc_sec', 'UTC_day', 'Utc_Hour'])
sub = 'utc'

count = 0
found = []
for item in myArray:
    if sub in item.lower():
        count += 1
        found.append(count)

print(found)

输出：

[1, 2, 3]

赞(0）回复(0）举报 2022-12-18

vmdwslir3#

我们可以使用list comprehension来获得正确的索引：

occ = [i for i in range(len(myArray)) if 'utc' in myArray[i].lower()]

产出

>>> print(occ)
... [1, 2, 3]

让我们从这个问题中得出一个普遍的用法：我们将建立一个函数，返回numpy string array中any子字符的出现索引。

get_occ_idx(sub, np_array):
    """ Occurences index of substring in a numpy string array
    """
    
    assert sub.islower(), f"Your substring '{sub}' must be lower case (should be : {sub.lower()})"
    assert all(isinstance(x, str)==False for x in np_array), "All items in the array must be strings"
    assert all(sub in x.lower() for x in np_array), f"There is no occurence of substring :'{sub}'"
    
    occ = [i for i in range(len(np_array)) if sub in np_array[i].lower()]
    
    return occ

型

赞(0）回复(0）举报 2022-12-18

我来回答

如何在numpy字符串数组中查找子字符串的所有匹配项

3条答案

相关问题

热门标签

最新问答