numpy 按总和和计数拆分数字列表

juud5qan  于 2023-11-18  发布在  其他
关注(0)|答案(2)|浏览(120)

我想拆分一个数字列表,并根据两个条件创建两个子集:
1.两个子集的数量之和应具有特定比例(例如45%-55%)
1.两个子集的数量计数应几乎相同(几乎意味着最多相差5%)
到目前为止,我所做的只是一种概率方法,而不是一种开发方法,它可以在添加一个while循环后用于小列表,该循环将保持计数条件。然而,我希望有一个更健壮的方法,而不是运行代码100次迭代以找到(可能找到)最佳分割。

import numpy as np
from scipy.stats import norm

def split_into_normal_distributions(numbers, target_percentage):
    # Calculate the means and standard deviation of the two normal distributions
    mean_1 = np.mean(numbers) * target_percentage
    mean_2 = np.mean(numbers) *(1-target_percentage)
    std = np.std(numbers)

    # Initialize subsets
    subset_1 = []
    subset_2 = []

    for num in numbers:
        # Calculate probability densities for each number in both distributions
        pdf_1 = norm.pdf(num, loc=mean_1, scale=std)
        pdf_2 = norm.pdf(num, loc=mean_2, scale=std)

        # Calculate the ratio of probabilities for assignment
        ratio = pdf_1 / (pdf_1 + pdf_2)
        pdf_sum = pdf_1 + pdf_2

        # Assign numbers to subsets based on the calculated ratio
        if np.random.rand() < ratio:
            subset_1.append(num)
        else:
            subset_2.append(num)

    return subset_1, subset_2

# Sample list of numbers
numbers = [10, 20, 30, 40, 50, 60, 70, 80,10,20,25,20,21,26,65,95,84,65,2,3,6,198,16,651,984,651,35,61,651,16,56,651,651,651,2,32,615,651,984,615,351,651,651,3,5]

# Split into two normal distributions with specified means and standard deviation
subset_1, subset_2 = split_into_normal_distributions(numbers, 0.4)

print("Subset 1 (40% mean):", subset_1, sum(subset_1)/sum(numbers), len(subset_1))
print("Subset 2 (60% mean):", subset_2, sum(subset_2)/sum(numbers), len(subset_2))
len(numbers)

字符串
谢谢你

368yc8dk

368yc8dk1#

这是我想到的。这是一个简单的贪婪的方法来产生一个分裂与平均总数。不完全是你所要求的,但希望它有帮助。

import bisect

class Cluster:
    def __init__(self, nums):
        self.nums = nums
        self.total = sum(nums)

def main():
    nums = [10, 20, 30, 40, 50, 60, 70, 80, 10, 20, 25, 20, 21, 26, 65, 95, 84, 65, 2, 3, 6, 198, 16, 651, 984, 651, 35, 61, 651, 16, 56, 651, 651, 651, 2, 32, 615, 651, 984, 615, 351, 651, 651, 3, 5]
    clusters = [Cluster([n]) for n in nums]
    while len(clusters) > 1:
        pairs = list(zip(clusters, clusters[1:]))
        best_pair_index = min(
            range(len(pairs)), key=lambda i: abs(pairs[i][0].total - pairs[i][1].total)
        )
        best_pair = pairs[best_pair_index]
        combined = Cluster(best_pair[0].nums + [-n for n in best_pair[1].nums])
        del clusters[best_pair_index : best_pair_index + 2]
        bisect.insort(clusters, combined, key=lambda c: c.total)
    [cluster] = clusters
    left = [n for n in cluster.nums if n > 0]
    right = [-n for n in cluster.nums if n < 0]
    print(sum(left))
    print(sum(right))
    print(len(left))
    print(len(right))

main()

字符串

dvtswwa3

dvtswwa32#

我将在这里发布我的尝试以供将来参考。我认为Alex Hall的答案是suberb,但有点复杂。这个函数可以很好地将列表与我想要的标准分开,并处理列表,如@RomanPerekhrest的评论。

def find_optimal_batch(numbers, ratio):

# Calculate the sum of the original list
total_sum = sum(numbers)

# Initialize variables to track the optimal batch and ratio
optimal_batch = []
optimal_difference = float('inf')  # Initialize to positive infinity

# Precompute the range of batch sizes
num_numbers = len(numbers)
batch_sizes = range(1, num_numbers + 1)

# Iterate through the batches
for batch_size in batch_sizes:

    # Iterate and calculate sums for each batch
    for i in range(num_numbers - batch_size + 1):

        batch_sum = sum(numbers[i:i+batch_size])
        batch_ratio = batch_sum / total_sum
        difference = abs(batch_ratio - ratio)

        # Check if the current batch has a closer ratio to the desired ratio
        if difference < optimal_difference:

            batch_1 = numbers[i:i+batch_size]
            batch_2 = numbers[0:i]+numbers[i+batch_size:]
            optimal_difference = difference
return batch_1, batch_2

字符串

相关问题