numpy 如何将一个列表划分为按正态分布变化的块

pinkon5k  于 2023-02-04  发布在  其他
关注(0)|答案(2)|浏览(132)

我想取一个包含数千个条目的列表,并将它们分组为12个块,其中每个块中的条目数对应于正态分布(钟形曲线),并且块之间没有重复-列表必须耗尽自己

输入数据如下所示

['6355ab76f70c5c59749f2018',
 '6355c797f70c5c5974a1cb15',
 '6355d256f70c5c5974a36a6c',
 '6355d270f70c5c5974a37356',
 '6355d29bf70c5c5974a3810a',
 '6355d300f70c5c5974a3a202',
 '6355d31af70c5c5974a3ab03',
 '6355d36cf70c5c5974a3c103',
 '6355d371f70c5c5974a3c236',
 '6355d389f70c5c5974a3c828',
 '6355d94df70c5c5974a55450',
 '6355d956f70c5c5974a556c1',
 '6355d987f70c5c5974a5626d',
 '6355d99df70c5c5974a566d9',
 '6355d9b1f70c5c5974a56b5c',
 '6355d9bbf70c5c5974a56d50',
 '6355d9d3f70c5c5974a572e1',
 '6355d9fdf70c5c5974a57c53',
 '6355da0cf70c5c5974a57f8f',
 '6355da11f70c5c5974a58065',
 '6355da19f70c5c5974a58261',
 '6355da68f70c5c5974a592ca',
 '6355da6cf70c5c5974a593ab',
 '6355da80f70c5c5974a597de',
 '6355da8af70c5c5974a599fa',
 '6355da93f70c5c5974a59c09',
 '6355da98f70c5c5974a59d20',
 '6355daa1f70c5c5974a59ec9',
 '6355daa7f70c5c5974a59fec',
 '6355dac5f70c5c5974a5a6dd',
 '6355dadaf70c5c5974a5ab75',
 '6355dafcf70c5c5974a5b2dc',
 '6355db6df70c5c5974a5d24b',
 '6355dba0f70c5c5974a5dfea',
 '6355dc16f70c5c5974a5fe14',
 '6355dc31f70c5c5974a6059d',
 '6355dc37f70c5c5974a60782',
 '6355dc3cf70c5c5974a608eb',
 '6355dc41f70c5c5974a60a99',
 '6355dc47f70c5c5974a60bb9',
 '6355dc5cf70c5c5974a611ef',
 '6355dc67f70c5c5974a61578',
 '6355dcaaf70c5c5974a62831',
 '6355dcb4f70c5c5974a62b2c',
 '6355dcbff70c5c5974a62e73',
 '6355dcc8f70c5c5974a63113',
 '6355dcd7f70c5c5974a6355c',
 '6355dcf3f70c5c5974a63c91',
 '6355dcf7f70c5c5974a63de9',
 '6355dd04f70c5c5974a64144',
 '6355dd0ef70c5c5974a64438',
 '6355dd53f70c5c5974a65902',
 '6355dd61f70c5c5974a65cf6',
 '6355dd6bf70c5c5974a66010',
 '6355dd70f70c5c5974a66195',
 '6355dd74f70c5c5974a662f9',
 '6355dd98f70c5c5974a66d4e',
 '6355dd9df70c5c5974a66e99',
 '6355dda2f70c5c5974a66fbd',
 '6355ddb0f70c5c5974a673e4',
 '6355ddbaf70c5c5974a67638',
 '6355ddc5f70c5c5974a6796b',
 '6355ddcef70c5c5974a67bcf',
 '6355de01f70c5c5974a6892c',
 '6355de15f70c5c5974a68ecf',
 '6355de1bf70c5c5974a69023',
 '6355de3df70c5c5974a699ad',
 '6355de58f70c5c5974a6a1ab',
 '6355de62f70c5c5974a6a4df',
 '6355de6bf70c5c5974a6a787',
 '6355de9cf70c5c5974a6b5a8',
 '6355dea0f70c5c5974a6b6ed',
 '6355deccf70c5c5974a6c3dc',
 '6355ded4f70c5c5974a6c602',
 '6355dee8f70c5c5974a6cbd2',
 '6355e8f1f70c5c5974a9db18',
 '6355e924f70c5c5974a9ec85',
 '6355e9dbf70c5c5974aa2b37',
 '6355eaaef70c5c5974aa7348',
 '6355ead5f70c5c5974aa81ac',
 '6355ec02f70c5c5974aaefaa',
 '6355ec64f70c5c5974ab135d',
 '6355ec8df70c5c5974ab2157',
 '6355ecb2f70c5c5974ab2ce7',
 '6355eccaf70c5c5974ab346f',
 '6355eccff70c5c5974ab3691',
 '6355ecd3f70c5c5974ab376b',
 '6355ece2f70c5c5974ab3ba0',
 '6355eceef70c5c5974ab3efb',
 '6355ecfef70c5c5974ab4384',
 '6355ed03f70c5c5974ab44c3',
 '6355ed24f70c5c5974ab4f4f',
 '6355ed4cf70c5c5974ab5b39',
 '6355ed78f70c5c5974ab6840',
 '6355ed9ff70c5c5974ab7388',
 '6355edb1f70c5c5974ab7888',
 '6355edb3f70c5c5974ab790b']

输出应该是什么样子...

我正在寻找这样的输出,一个对象列表,其中一个数字键对应于0-11中的一个数字,并将分块列表项作为键:

[
    { 0: ['6355ab76f70c5c59749f2018', '6355c797f70c5c5974a1cb15', '6355d256f70c5c5974a36a6c' ] },
    { 1: ['6355d270f70c5c5974a37356',
 '6355d29bf70c5c5974a3810a',
 '6355d300f70c5c5974a3a202',
 '6355d31af70c5c5974a3ab03',
 '6355d36cf70c5c5974a3c103',
 '6355d371f70c5c5974a3c236',
 '6355d389f70c5c5974a3c828'] },
    ...
]
输出块应沿着与此图像相同的梯度,甚至在两侧,并且更靠近中心,对于n大小列表......

x1c 0d1x * 它应该将输入列表分成偶数(在两侧)块,以梯度数学方式递增,每个块越多,越靠近输出列表的中心。*
我希望传入的列表被分割,使得最多的项被分组在中间(大约是4-8号),而较少的项在到达结果列表的“边缘”时被分组在一起(0-3号和9-12号),但是输入列表的所有项都必须被用尽,这样项才能以这种方式完全分布。
我试图用numpy解决这个问题,但到目前为止我还不能得到我想要的输出。
我目前的代码(两个不同的功能):

def divide_list_normal(lst):
    normal_dist = np.random.normal(size=len(lst)) # Generate a normal distribution of numbers
    sorted_list = [x for _,x in sorted(zip(normal_dist,lst))] # Sort the list according to the normal distribution
    chunk_size = int(len(lst)/len(normal_dist)) # Divide the list into chunks
    chunks = [sorted_list[i:i+chunk_size] for i in range(0, len(sorted_list), chunk_size)]
    return chunks 

def divide_list_normal_define_chunk_size(lst, n):
    normal_dist = np.random.normal(size=len(lst)) # Generate a normal distribution of numbers
    sorted_list = [x for _,x in sorted(zip(normal_dist,lst))] # Sort the list according to the normal distribution
    chunk_size = int(len(lst)/len(normal_dist)) # Divide the list into chunks
    chunks = [sorted_list[i:i+chunk_size] for i in range(0, n, chunk_size)]
    return chunks

第一个函数的输出如下所示:

[['63a8d83336756fd65d455c77'],
 ['6355f7c6f70c5c5974adfbce'],
 ['635629c6f70c5c5974bbab53'],
 ['6355fa8bf70c5c5974aeb70f'],
 ['6355dcd7f70c5c5974a6355c'],
 ['63a96dae36756fd65d549333'],
 ['639245927eeb4e9fd025e397'],
 ['63562463f70c5c5974ba3b5c'],
 ['63a8e04736756fd65d4635cf'],
 ['635629a5f70c5c5974bba1c1'],
 ['6355f74ef70c5c5974addd2c'],...]

第二个的输出如下所示:

[['63aa1a9d36756fd65d7566cf'],
 ['6355ed78f70c5c5974ab6840'],
 ['63a94e1836756fd65d500d5d'],
 ['63a8e23e36756fd65d4667ec'],
 ['63a96c6536756fd65d5463db'],
 ['63d39021d34efb9c0983d64a'],
 ['635627a9f70c5c5974bb1573'],
 ['63b3a4c236756fd65d33750a'],
 ['63562320f70c5c5974b9e50b'],
 ['63aa1aec36756fd65d758676'],
 ['63a9551636756fd65d5111fb'],
 ['63562443f70c5c5974ba31ed']]

有没有一种方法可以把一个列表分成按正态分布变化的块?如果你知道怎么做,请分享。谢谢!

nbnkbykc

nbnkbykc1#

这是可行的,尽管根据您的要求可能会比较慢

import numpy as np
from itertools import islice

testList = ['6355d29bf70c5c5974a3810a',
 '6355d300f70c5c5974a3a202',
 '6355d31af70c5c5974a3ab03',
 '6355d36cf70c5c5974a3c103',
 '6355d300f70c5c5974a3a202',
 '6355d31af70c5c5974a3ab03',
  '6355d36cf70c5c5974a3c103',
 '6355d300f70c5c5974a3a202',
 '6355d31af70c5c5974a3ab03',
 '6355d36cf70c5c5974a3c103',
 '6355d300f70c5c5974a3a202',
 '6355d31af70c5c5974a3ab03',
 '6355d36cf70c5c5974a3c103',
 '6355d371f70c5c5974a3c236',
 '6355d389f70c5c5974a3c828']

normal_dist = np.random.normal(size=len(testList),loc=10,scale=4) 
sorted_list = [list(islice(testList, int(x))) for x in normal_dist]

有一件事你必须注意的是,因为这些是一个列表的切片,正态分布不可能超出界限,即:0〈局部尺度〈长度(测试列表)

sqxo8psd

sqxo8psd2#

对于每个索引i,求出i+0.5的CDF,然后减去i-.5的CDF,这将是你应该放在该索引中的列表的百分比,对于第一个索引,你只得到i+.5的CDF,而不减去i-.5的CDF,对于最后一个索引,你只得到i-.5的CDF,从1中减去这个值,而不是从i+.5的CDF中减去这个值。您可能希望平均值位于所有指数的中间,并根据所需的分布选择标准差(您可能希望标准差大约为指数数的四分之一,但这取决于您)。

相关问题