python 在两个没有集合的范围之间减去重叠

5cg8jx4n  于 12个月前  发布在  Python
关注(0)|答案(9)|浏览(123)

没有集合!

我无法使用集合,因为:

  • 范围将太长。
  • 它们会占用太多内存
  • 集本身的创建将花费太长时间。

如果只使用范围的端点,是否有最佳方法来减去两个范围列表?

示例:

r1 = (1, 1000), (1100, 1200)  
r2 = (30, 50), (60, 200), (1150, 1300)

r1 - r2 = (1, 29), (51, 59), (201, 1000), (1100, 1149)

字符串

其他信息:

  • r2不必与r1重叠
  • r1和r2不会有重叠的配对。例如,r1不会同时有(0,30)和(10,25)

谢谢.

relj7zay

relj7zay1#

interval包可以提供您所需要的一切。

from interval import Interval, IntervalSet
r1 = IntervalSet([Interval(1, 1000), Interval(1100, 1200)])
r2 = IntervalSet([Interval(30, 50), Interval(60, 200), Interval(1150, 1300)])
print(r1 - r2)

>>> [1..30),(50..60),(200..1000],[1100..1150)

字符串

k7fdbhmy

k7fdbhmy2#

这是一个有趣的问题!
我认为这是正确的,而且它相当紧凑。它应该适用于所有类型的重叠范围,但它假设格式良好的范围(即[x, y),其中x < y)。为了简单起见,它使用[x, y)样式的范围。它基于实际上只有六种可能的排列(结果在()中)的观察:

编辑:我找到了一个更紧凑的表示:

(s1 e1)  s2 e2
(s1 s2)  e1 e2
(s1 s2) (e2 e1)

 s2 e2  (s1 e1)
 s2 s1  (e2 e1)
 s2 s1   e1 e2 ()

字符串
给定一个排序的端点列表,如果endpoints[0] == s1,那么前两个端点应该在结果中。如果endpoints[3] == e1,那么最后两个端点应该在结果中。如果都没有,那么应该没有结果。
“我还没怎么测试过,完全有可能出错,如果发现错误请告诉我!”

import itertools

def range_diff(r1, r2):
    s1, e1 = r1
    s2, e2 = r2
    endpoints = sorted((s1, s2, e1, e2))
    result = []
    if endpoints[0] == s1 and endpoints[1] != s1:
        result.append((endpoints[0], endpoints[1]))
    if endpoints[3] == e1 and endpoints[2] != e1:
        result.append((endpoints[2], endpoints[3]))
    return result

def multirange_diff(r1_list, r2_list):
    for r2 in r2_list:
        r1_list = list(itertools.chain(*[range_diff(r1, r2) for r1 in r1_list]))
    return r1_list


测试:

>>> r1_list = [(1, 1001), (1100, 1201)]
>>> r2_list = [(30, 51), (60, 201), (1150, 1301)]
>>> print multirange_diff(r1_list, r2_list)
[(1, 30), (51, 60), (201, 1001), (1100, 1150)]

nkkqxpd9

nkkqxpd93#

一个解决方案(除了这里介绍的所有其他不同的解决方案之外)是使用区间/段树(它们实际上是一样的):
http://en.wikipedia.org/wiki/Segment_tree
http://en.wikipedia.org/wiki/Interval_tree
这样做的一个很大的好处是,它是微不足道的做任意布尔运算(不仅仅是减法)使用同一段代码。在de贝格中有对这种数据结构的标准处理。要对一对区间树执行任何布尔运算,(包括减法)你只要把它们合并在一起。这里有一些(诚然幼稚)Python代码,用于对不平衡的范围树执行此操作。它们不平衡的事实对合并树所需的时间没有影响,然而,这里的树构造是真正愚蠢的部分,最终是二次的(除非reduce是通过分区执行的,我有点怀疑)。无论如何,你去了:

class IntervalTree:
    def __init__(self, h, left, right):
        self.h = h
        self.left = left
        self.right = right

def merge(A, B, op, l=-float("inf"), u=float("inf")):
    if l > u:
        return None
    if not isinstance(A, IntervalTree):
        if isinstance(B, IntervalTree):
            opT = op
            A, B, op = B, A, (lambda x, y : opT(y,x))
        else:
            return op(A, B)
    left = merge(A.left, B, op, l, min(A.h, u))
    right = merge(A.right, B, op, max(A.h, l), u)
    if left is None:
        return right
    elif right is None or left == right:
        return left
    return IntervalTree(A.h, left, right)

def to_range_list(T, l=-float("inf"), u=float("inf")):
    if isinstance(T, IntervalTree):
        return to_range_list(T.left, l, T.h) + to_range_list(T.right, T.h, u)
    return [(l, u-1)] if T else []

def range_list_to_tree(L):
    return reduce(lambda x, y : merge(x, y, lambda a, b: a or b), 
        [ IntervalTree(R[0], False, IntervalTree(R[1]+1, True, False)) for R in L ])

字符串
我写得很快,没有测试太多,所以可能会有bug。(你只需在merge中将它们作为参数传递给op)。(这也是结果中的间隔数)。作为一个例子,我在你提供的案例上运行了它:

#Example:
r1 = range_list_to_tree([ (1, 1000), (1100, 1200) ])
r2 = range_list_to_tree([ (30, 50), (60, 200), (1150, 1300) ])
diff = merge(r1, r2, lambda a, b : a and not b)
print to_range_list(diff)


我得到了以下输出:
[(1,29),(51,59),(201,1000),(1100,1149)]
这似乎与你所期望的一致。现在,如果你想做一些其他的布尔运算,这里是它如何使用相同的函数工作:

#Intersection
merge(r1, r2, lambda a, b : a and b)

#Union
merge(r1, r2, lambda a, b : a or b)

#Xor
merge(r1, r2, lambda a, b : a != b)

rryofs0p

rryofs0p4#

我想我误解了这个问题,但如果r2r1的子集,则此代码有效

class RangeSet:
    def __init__(self, elements):
        self.ranges = list(elements)

    def __iter__(self):
        return iter(self.ranges)

    def __repr__(self):
        return 'RangeSet: %r' % self.ranges

    def has(self, tup):
        for pos, i in enumerate(self.ranges):
            if i[0] <= tup[0] and i[1] >= tup[1]:
                return pos, i
        raise ValueError('Invalid range or overlapping range')

    def minus(self, tup):
        pos, (x,y) = self.has(tup)
        out = []
        if x < tup[0]:
            out.append((x, tup[0]-1))
        if y > tup[1]:
            out.append((tup[1]+1, y))
        self.ranges[pos:pos+1] = out

    def __sub__(self, r):
        r1 = RangeSet(self)
        for i in r: r1.minus(i)
        return r1

    def sub(self, r): #inplace subtraction
        for i in r:
            self.minus(i)

字符串
然后,你做:

更新:注意r2的最后一个间隔与我的意思不同。

>>> r1 = RangeSet(((1, 1000), (1100, 1200)))
>>> r2 = RangeSet([(30, 50), (60, 200), (1150, 1200)])
>>> r1 - r2
RangeSet: [(1, 29), (51, 59), (201, 1000), (1100, 1149)]
>>> r1.sub(r2)
>>> r1
RangeSet: [(1, 29), (51, 59), (201, 1000), (1100, 1149)]

wpcxdonn

wpcxdonn5#

这里有一个快速的python函数,它可以做减法,不管初始列表是否格式良好(即在做减法之前,将列表转换为最小的等效范围列表,排序):

def condense(l):
    l = sorted(l)
    temp = [l.pop(0)]
    for t in l:
        if t[0] <= temp[-1][1]:
            t2 = temp.pop()
            temp.append((t2[0], max(t[1], t2[1])))
        else:
            temp.append(t)
    return temp

def setSubtract(l1, l2):
    l1 = condense(l1)
    l2 = condense(l2)
    i = 0
    for t in l2:
        while t[0] > l1[i][1]:
            i += 1
            if i >= len(l1):
                break
        if t[1] < l1[i][1] and t[0] > l1[i][0]:
            #t cuts l1[i] in 2 pieces
            l1 = l1[:i] + [(l1[i][0], t[0] - 1), (t[1] + 1, l1[i][1])] + l1[i + 1:]
        elif t[1] >= l1[i][1] and t[0] <= l1[i][0]:
            #t eliminates l1[i]
            l1.pop(i)
        elif t[1] >= l1[i][1]:
            #t cuts off the top end of l1[i]
            l1[i] = (l1[i][0], t[0] - 1)
        elif t[0] <= l1[i][0]:
            #t cuts off the bottom end of l1[i]
            l1[i] = (t[1] + 1, l1[i][1])
        else:
            print "This shouldn't happen..."
            exit()
    return l1

r1 = (1, 1000), (1100, 1200)
r2 = (30, 50), (60, 200), (1150, 1300)
setSubtract(r1, r2) #yields [(1, 29), (51, 59), (201, 1000), (1100, 1149)]

字符串

vcirk6k6

vcirk6k66#

有趣的问题!另一个实现,虽然你已经有很多了。这很有趣!包括一些额外的“装饰”,使我正在做的更明确。

import itertools

def flatten_range_to_labeled_points(input_range,label):
    range_with_labels = [((start,'start_%s'%label),(end,'end_%s'%label)) for (start,end) in input_range]
    flattened_range = list(reduce(itertools.chain,range_with_labels))
    return flattened_range 

def unflatten_range_remove_labels(input_range):
    without_labels = [x for (x,y) in input_range]
    grouped_into_pairs = itertools.izip(without_labels[::2], without_labels[1::2])
    return grouped_into_pairs

def subtract_ranges(range1, range2):
    range1_labeled = flatten_range_to_labeled_points(range1,1)
    range2_labeled = flatten_range_to_labeled_points(range2,2)
    all_starts_ends_together = sorted(range1_labeled + range2_labeled)
    in_range1, in_range2 = False, False
    new_starts_ends = []
    for (position,label) in all_starts_ends_together:
        if label=='start_1':
            in_range1 = True
            if not in_range2:
                new_starts_ends.append((position,'start'))
        elif label=='end_1':
            in_range1 = False
            if not in_range2:
                new_starts_ends.append((position,'end'))
        elif label=='start_2':
            in_range2 = True
            if in_range1:
                new_starts_ends.append((position-1,'end'))
        elif label=='end_2':
            in_range2 = False
            if in_range1:
                new_starts_ends.append((position+1,'start'))
    # strip the start/end labels, they're not used, I just appended them for clarity
    return unflatten_range_remove_labels(new_starts_ends)

字符串
我得到正确的输出:

r1 = (1, 1000), (1100, 1200)
r2 = (30, 50), (60, 200), (1150, 1300)
>>> subtract_ranges(r1,r2)
[(1, 29), (51, 59), (201, 1000), (1100, 1149)]

hmmo2u0o

hmmo2u0o7#

而不是https://pypi.org/project/interval/1.0.0/使用https://pypi.org/project/python-ranges/

from ranges import Range, RangeSet
r1 = RangeSet([Range(1, 1000), Range(1100, 1200)])
r2 = RangeSet([Range(30, 50), Range(60, 200), Range(1150, 1300)])
print(r1 - r2)

字符串

nr9pn0ug

nr9pn0ug8#

这是相当丑陋的,但它确实适用于给定的示例

def minus1(a,b):
    if (b[0] < a[0] and b[1] < a[0]) or (a[1] < b[0] and a[1] < b[1]):
        return [a] # doesn't overlap
    if a[0]==b[0] and a[1]==b[1]:
        return [] # overlaps exactly
    if b[0] < a[0] and a[1] < b[1]:
        return [] # overlaps completely
    if a[0]==b[0]:
        return [(b[1]+1,a[1])] # overlaps exactly on the left
    if a[1]==b[1]:
        return [(a[0],b[0]-1)] # overlaps exactly on the right 
    if a[0] < b[0] and b[0] < a[1] and a[1] < b[1]:
        return [(a[0],b[0]-1)] # overlaps the end
    if a[0] < b[1] and b[1] < a[1] and b[0] < a[0]:
        return [(b[1]+1,a[1])] # overlaps the start
    else:
        return [(a[0],b[0]-1),(b[1]+1,a[1])] # somewhere in the middle

def minus(r1, r2):
    # assume r1 and r2 are already sorted
    r1 = r1[:]
    r2 = r2[:]
    l = []
    v = r1.pop(0)
    b = r2.pop(0)
    while True:
        r = minus1(v,b)
        if r:
            if len(r)==1:
                if r[0] == v:
                    if v[1] < b[0] and v[1] < b[1]:
                        l.append(r[0])
                        if r1:
                            v = r1.pop(0)
                        else:
                            break
                    else:
                        if r2:
                            b = r2.pop(0)
                        else:
                            break
                else:
                    v = r[0]
            else:
                l.append(r[0])
                v = r[1]
                if r2:
                    b = r2.pop(0)
                else:
                    l.append(v)
                    break
        else:
            if r1:
                v = r1.pop(0)
            else:
                break
            if r2:
                b = r2.pop(0)
            else:
                l.append(v)
                l.extend(r1)
                break
    return l

r1 = [(1, 1000), (1100, 1200)]
r2 = [(30, 50), (60, 200), (1150, 1300)]

print minus(r1,r2)

字符串
打印:

[(1, 29), (51, 59), (201, 1000), (1100, 1149)]

gz5pxeao

gz5pxeao9#

另一个选择是使用portion,它支持python3

import portion as P

r1 = P.closedopen(1, 1000) |  P.closedopen(1100, 1200)
r2 = P.closedopen(30, 50) | P.closedopen(60, 200) | P.closedopen(1150, 1300)
print(r1 - r2)

字符串

相关问题