使用numpy从矩阵中选择元素

wwwo4jvm 于 11个月前发布在其他

关注(0)|答案(3)|浏览(111)

我尝试使用numpy来快速文本分析。确切地说是搭配分析。让我们假设我有以下字符串，我将其转换为numpy数组：

text = np.array(['a', 'b', 'c', 'd', 'e', 'b', 'f', 'g'])

字符串
假设我想从这个数组中获取字母'b'的左右上下文。假设左边有1个元素，右边有2个元素。所以我想有这样的东西：

['a', 'c', 'd'] +  ['e', 'f', 'g']

型
有没有可能用Numpy广播所有的操作？我只是在文本上循环，但这非常耗时。
我试过np.select，np.where和np.mask
谢谢你的帮助：）

numpy

来源：https://stackoverflow.com/questions/77359733/select-elements-from-a-matrix-with-numpy

3条答案

按热度按时间

ttcibm8c1#

一种可能的方法是找到b值索引（使用np.where(arr == 'b')）来进一步索引相邻值：

arr = np.array(['a', 'b', 'c', 'd', 'e', 'b', 'f', 'g'])
lr_contexts = [arr[[i-1, i+1, i+2]] for i in np.where(arr == 'b')[0]]
print(lr_contexts)

个字符

赞(0）回复(0）举报 11个月前

ntjbwcob2#

我相信前面的答案是要走的路，如果你真的想使用numpy.但如果它是适用的，我会建议你给予尝试regex功能在你的文本模式任务.对于这个任务，下面的函数将解决它使用re包.

import re

def get_text_around_char(text, char, n_left, n_rigth):
    matches = []
    for match in re.finditer(char, text):
        s, e = match.start(), match.end()
        matches.append(text[s-n_left:s]+text[s+1:e+n_rigth]) 
    return matches

print(get_text_around_char("abcdebfg", "b", 1, 2))

字符串
'acd'，'efg']

赞(0）回复(0）举报 11个月前

iugsix8n3#

或许你可以考虑每一个窗口的4个字母？

import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as swv

text = np.array(['a', 'b', 'c', 'd', 'e', 'b', 'f', 'g'])

arr = swv(text, 4)
out = arr[ np.ix_(      # Take from the array,
    arr[:, 1] == 'b',   # for each row where the 2nd value is a b,
    [0, 2, 3]           # the 1st, 3rd and 4th column.
)]

字符串
out：

array([['a', 'c', 'd'],
       ['e', 'f', 'g']], dtype='<U1')

型
arr：

array([['a', 'b', 'c', 'd'],
       ['b', 'c', 'd', 'e'],
       ['c', 'd', 'e', 'b'],
       ['d', 'e', 'b', 'f'],
       ['e', 'b', 'f', 'g']], dtype='<U1')

型

赞(0）回复(0）举报 11个月前

我来回答

使用numpy从矩阵中选择元素

3条答案

相关问题

热门标签

最新问答