numpy 如何混洗一个数组而没有两个连续的元素重复?

lb3vh1jj  于 2022-12-29  发布在  其他
关注(0)|答案(3)|浏览(205)

我目前正在尝试得到一个像这样的数字数组随机洗牌:

label_array = np.repeat(np.arange(6), 12)

唯一的限制是shuffle中没有连续的元素必须是相同的数字,为此我现在使用的代码是:

# Check if there are any occurrences of two consecutive 
# elements being of the same category (same number)
num_occurrences = np.sum(np.diff(label_array) == 0)

# While there are any occurrences of this...
while num_occurrences != 0:
    # ...shuffle the array...
    np.random.shuffle(label_array)

    # ...create a flag for occurrences...
    flag = np.hstack(([False], np.diff(label_array) == 0))
    flag_array = label_array[flag]

    # ...and shuffle them.
    np.random.shuffle(flag_array)

    # Then re-assign them to the original array...
    label_array[flag] = flag_array

    # ...and check the number of occurrences again.
    num_occurrences = np.sum(np.diff(label_array) == 0)

虽然这对这种大小的数组有效,但我不知道它是否对更大的数组有效,即使如此,它也可能需要很多时间。
那么,有没有更好的办法呢?

g52tjvyc

g52tjvyc1#

可能不是技术上的最佳答案,希望它能满足您的要求。

import numpy as np
def generate_random_array(block_length, block_count):
    for blocks in range(0, block_count):
        nums = np.arange(block_length)
        np.random.shuffle(nums)
        try:
            if nums[0] == randoms_array [-1]:
                nums[0], nums[-1] = nums[-1], nums[0]
        except NameError:
            randoms_array = []
        randoms_array.extend(nums)
    return randoms_array

generate_random_array(block_length=1000, block_count=1000)
d7v8vwbk

d7v8vwbk2#

这里有一个方法,对于Python 3.6,使用random. choices,它允许从一个有权重的群体中进行选择。
我们的想法是一个接一个地生成数字,每次生成一个新数字时,我们通过暂时将其权重设置为零来排除前一个数字,然后,我们递减所选数字的权重。
正如@roganjosh适时指出的那样,当我们剩下最后一个值的多个示例时,我们就有了一个问题--这种情况可能非常频繁,特别是在值数量少而重复次数多的情况下。
我使用的解决方案是使用简短的send_back函数将这些值插回到列表中,这样它们就不会产生冲突。

import random

def send_back(value, number, lst):
    idx = len(lst)-2
    for _ in range(number):
        while lst[idx] == value or lst[idx-1] == value:
            idx -= 1
        lst.insert(idx, value)

def shuffle_without_doubles(nb_values, repeats):
    population = list(range(nb_values))
    weights = [repeats] * nb_values
    out = []
    prev = None
    for i in range(nb_values * repeats):
        if prev is not None:
            # remove prev from the list of possible choices
            # by turning its weight temporarily to zero
            old_weight = weights[prev]
            weights[prev] = 0    

        try:
            chosen = random.choices(population, weights)[0]
        except IndexError:
            # We are here because all of our weights are 0,
            # which means that all is left to choose from
            # is old_weight times the previous value
            send_back(prev, old_weight, out)
            break

        out.append(chosen)
        weights[chosen] -= 1
        if prev is not None:
            # restore weight
            weights[prev] = old_weight
        prev = chosen
    return out
print(shuffle_without_doubles(6, 12))

[5, 1, 3, 4, 3, 2, 1, 5, 3, 5, 2, 0, 5, 4, 3, 4, 5,
 3, 4, 0, 4, 1, 0, 1, 5, 3, 0, 2, 3, 4, 1, 2, 4, 1,
 0, 2, 0, 2, 5, 0, 2, 1, 0, 5, 2, 0, 5, 0, 3, 2, 1,
 2, 1, 5, 1, 3, 5, 4, 2, 4, 0, 4, 2, 4, 0, 1, 3, 4,
 5, 3, 1, 3]

一些粗略的时间安排:生成(shuffle_without_doubles(600, 1200))(即720000个值)需要大约30秒。

zzlelutf

zzlelutf3#

我是从Creating a list without back-to-back repetitions from multiple repeating elements(称为“问题A”)开始整理笔记的,“问题A”和当前的问题都没有正确答案,而且这两个问题似乎不同,因为问题A需要相同的元素。
你问的基本上就是一个算法问题(link),其中随机性是不需要的。但是当你有像几乎一半的所有数字相同时,结果只能像“ABACADAEA...",其中“ABCDE”是数字。在most voted answer到这个问题中,使用了优先级队列,所以时间复杂度是O(n log m),其中n是输出的长度,m是选项的计数。
对于这个问题,一个更简单的方法是使用itertools.permutations,并随机选择其中一些具有不同的开始和结束,所以它看起来像“随机”
我在这里写了草稿代码,它工作正常。

from itertools import permutations
from random import choice

def no_dup_shuffle(ele_count: int, repeat: int):
    """
    Return a shuffle of `ele_count` elements repeating `repeat` times.
    """

    p = permutations(range(ele_count))
    res = []
    curr = last = [-1]  # -1 is a dummy value for the first `extend`
    for _ in range(repeat):
        while curr[0] == last[-1]:
            curr = choice(list(p))
        res.extend(curr)
        last = curr
    return res

def test_no_dup_shuffle(count, rep):
    r = no_dup_shuffle(count, rep)
    assert len(r) == count * rep  # check result length
    assert len(set(r)) == count  # check all elements are used and in `range(count)`
    for i, n in enumerate(r):  # check no duplicate
        assert n != r[i - 1]
    print(r)

if __name__ == "__main__":
    test_no_dup_shuffle(5, 3)
    test_no_dup_shuffle(3, 17)

相关问题