我想拆分一个数字列表,并根据两个条件创建两个子集:
1.两个子集的数量之和应具有特定比例(例如45%-55%)
1.两个子集的数量计数应几乎相同(几乎意味着最多相差5%)
到目前为止,我所做的只是一种概率方法,而不是一种开发方法,它可以在添加一个while循环后用于小列表,该循环将保持计数条件。然而,我希望有一个更健壮的方法,而不是运行代码100次迭代以找到(可能找到)最佳分割。
import numpy as np
from scipy.stats import norm
def split_into_normal_distributions(numbers, target_percentage):
# Calculate the means and standard deviation of the two normal distributions
mean_1 = np.mean(numbers) * target_percentage
mean_2 = np.mean(numbers) *(1-target_percentage)
std = np.std(numbers)
# Initialize subsets
subset_1 = []
subset_2 = []
for num in numbers:
# Calculate probability densities for each number in both distributions
pdf_1 = norm.pdf(num, loc=mean_1, scale=std)
pdf_2 = norm.pdf(num, loc=mean_2, scale=std)
# Calculate the ratio of probabilities for assignment
ratio = pdf_1 / (pdf_1 + pdf_2)
pdf_sum = pdf_1 + pdf_2
# Assign numbers to subsets based on the calculated ratio
if np.random.rand() < ratio:
subset_1.append(num)
else:
subset_2.append(num)
return subset_1, subset_2
# Sample list of numbers
numbers = [10, 20, 30, 40, 50, 60, 70, 80,10,20,25,20,21,26,65,95,84,65,2,3,6,198,16,651,984,651,35,61,651,16,56,651,651,651,2,32,615,651,984,615,351,651,651,3,5]
# Split into two normal distributions with specified means and standard deviation
subset_1, subset_2 = split_into_normal_distributions(numbers, 0.4)
print("Subset 1 (40% mean):", subset_1, sum(subset_1)/sum(numbers), len(subset_1))
print("Subset 2 (60% mean):", subset_2, sum(subset_2)/sum(numbers), len(subset_2))
len(numbers)
字符串
谢谢你
2条答案
按热度按时间368yc8dk1#
这是我想到的。这是一个简单的贪婪的方法来产生一个分裂与平均总数。不完全是你所要求的,但希望它有帮助。
字符串
dvtswwa32#
我将在这里发布我的尝试以供将来参考。我认为Alex Hall的答案是suberb,但有点复杂。这个函数可以很好地将列表与我想要的标准分开,并处理列表,如@RomanPerekhrest的评论。
字符串