这是一个在Knuth Volume 3(Exercise 5.2,问题18,我认为)中指定的优化。其中down()
在一个特定位置(在堆弹出或初始化期间)被调用。它表示将该元素移动到叶子节点,然后向上移动到适当的位置是有益的,而不是一旦我们找到一个项目小于等于其两个子项的位置就立即中断。
Python实现了这个优化,而heapq
模块中关于这个方法的注解比我能够表达得更好(还包括比较数字)。
# The child indices of heap index pos are already heaps, and we want to make
# a heap at index pos too. We do this by bubbling the smaller child of
# pos up (and so on with that child's children, etc) until hitting a leaf,
# then using _siftdown to move the oddball originally at index pos into place.
#
# We *could* break out of the loop as soon as we find a pos where newitem <=
# both its children, but turns out that's not a good idea, and despite that
# many books write the algorithm that way. During a heap pop, the last array
# element is sifted in, and that tends to be large, so that comparing it
# against values starting from the root usually doesn't pay (= usually doesn't
# get us out of the loop early). See Knuth, Volume 3, where this is
# explained and quantified in an exercise.
#
# Cutting the # of comparisons is important, since these routines have no
# way to extract "the priority" from an array element, so that intelligence
# is likely to be hiding in custom comparison methods, or in array elements
# storing (priority, record) tuples. Comparisons are thus potentially
# expensive.
#
# On random arrays of length 1000, making this change cut the number of
# comparisons made by heapify() a little, and those made by exhaustive
# heappop() a lot, in accord with theory. Here are typical results from 3
# runs (3 just to demonstrate how small the variance is):
#
# Compares needed by heapify Compares needed by 1000 heappops
# -------------------------- --------------------------------
# 1837 cut to 1663 14996 cut to 8680
# 1855 cut to 1659 14966 cut to 8678
# 1847 cut to 1660 15024 cut to 8703
#
# Building the heap by using heappush() 1000 times instead required
# 2198, 2148, and 2219 compares: heapify() is more efficient, when
# you can use it.
#
# The total compares needed by list.sort() on the same lists were 8627,
# 8627, and 8632 (this should be compared to the sum of heapify() and
# heappop() compares): list.sort() is (unsurprisingly!) more efficient
# for sorting.
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of smaller child.
rightpos = childpos + 1
if rightpos < endpos and not heap[childpos] < heap[rightpos]:
childpos = rightpos
# Move the smaller child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown(heap, startpos, pos)
是否有对此优化的兴趣?我刚接触Go,但如果这是你想看到的内容,我很乐意为此做出贡献。谢谢!
7条答案
按热度按时间tgabmvqs1#
cc @griesemer
9w11ddsr2#
听起来合理。你需要一些基准测试来证明你的改变是合理的。这样的改变可能会使Pop()操作对于无序的键产生不同的顺序。这已经接近于“破坏go1兼容性保证”的领域,尽管我认为它可能是可以接受的。
bfrts1fy3#
@randall77所说。请随意尝试。请注意,这不是一个高优先级的问题,所以可能需要一段时间才能彻底审查。
j91ykkif4#
太好了,谢谢!我很快就会在Gerrit上发布一篇评论,并尝试进行一些基准测试。
x6h2sr285#
https://golang.org/cl/56070提到了这个问题:
WIP: container/heap: Optimize heap to reduce comparisons
kokeuurv6#
@rajathagasthya 你还在继续做吗?
rta7y2nd7#
是的,PR需要审查。