使用numpy排除重叠区间

xn1cxnb4  于 2023-06-06  发布在  其他
关注(0)|答案(2)|浏览(110)

我有两个间隔列表。我想删除list1中已经存在于list2中的所有范围。示例:列表1:
[(0,10),(15,20)]
列表2:
[(2,3),(5,6)]
输出:
[(0,2),(3,5),(6,10),(15,20)]
下面是同样的问题,询问如何在Java Exclude overlapping intervals中做到这一点,但我想用numpy来做。

yrefmtwq

yrefmtwq1#

list1 = [(0,10),(15,20)]
list2 = [(2,3),(5,6)]

s = min(list1[0][0], list2[0][0])
e = max(list1[-1][-1], list2[-1][-1])
# by default no slot is in an interval (0)
arr = np.full((e - s + 3,), 0)
# mark intervals from list1
for a, b in list1:
    arr[a-s+1:b-s+1] = 1
# unmark intervals from list2
for a, b in list2:
    arr[a-s+1:b-s+1] = 0
# detect edges (start and end points)
starts = np.where((arr[1:] == 1) & (arr[:-1] == 0))[0] + s
ends = np.where((arr[:-1] == 1) & (arr[1:] == 0))[0] + s

print(list(zip(starts, ends)))

印刷品
[(0、(2)、(3、5)、(6、10)、(15、20)]
注意:这假设每个区间都有最小的数字,每个列表都有升序排列的区间,并且只有整数条目。

kyks70gy

kyks70gy2#

下面是非整数项的答案。很好。还有更优雅的方法吗?

def locateInIntervals(list1, x):
    """
    find the interval in list1 x belongs to, and the interval before x
    assume list1 is sorted and contains intervals that do not overlap
    return: [ibefore, i_in]
    """
    starts = [x[0] for x in list1]
    if x < starts[0]:
        return [-42, -42] # x lies before the first interval
    else:
        indx = np.where(x >= np.array(starts))[0][-1]
        if x < list1[indx][1]:
            if indx==0: return [-42, indx] # x in list1[0]
            else: return [indx-1, indx] # x in list1[indx] for indx>0
        else: return [indx, -42] # x follows list1[indx] but not in any interval

def excludeIntervals(list1, list2):
    """
    exclude intervals in list2 from list1.
    assume intervals do not overlap 
    """
    list1 = [list(interval) for interval in list1]
    list2 = [list(interval) for interval in list2]
    list1.sort()
    list2.sort()
    res = list1
    for interval in list2:
        ind_start_before, ind_start = locateInIntervals(res, interval[0])
        ind_end_before, ind_end = locateInIntervals(res, interval[1])
        if ((ind_start_before < 0) & (ind_start < 0) & (ind_end >= 0)):
            res[ind_end] = [interval[1], res[ind_end][1]]
            for i in range(ind_end): del res[I]
        elif ((ind_start_before < 0) & (ind_start < 0) & (ind_end_before >= 0) & (ind_end < 0)):
            for i in range(ind_end_before+1): del res[I]
        elif ((ind_start >= 0) & (ind_end >= 0)):
            if ind_start==ind_end:
                res.insert(ind_start+1, [interval[1], res[ind_end][1]])
                res[ind_start] = [res[ind_start][0], interval[0]]
            else:
                res[ind_start] = [res[ind_start][0], interval[0]]
                res[ind_end] = [interval[1], res[ind_end][1]]
        elif ((ind_start >= 0) & (ind_end_before >= 0) & (ind_end < 0)): 
            res[ind_start] = [res[ind_start][0], interval[0]]
            for i in range(ind_start+1, ind_end_before+1): del res[I]
        elif ((ind_start_before >= 0) & (ind_start < 0) & (ind_end >= 0)):
            res[ind_end] = [interval[1], res[ind_end][1]]
            for i in range(ind_start_before+1, ind_end): del res[I]
        elif ((ind_start_before >= 0) & (ind_start < 0) & (ind_end < 0)):
            for i in range(ind_start_before+1, ind_end_before+1): del res[I]
    return res

excludeIntervals([[0.1,10],[15.5,20]],[[2,3.3],[5,6]])
返回[[0.1,2],[3.3,5],[6,10],[15.5,20]]

相关问题