numpy 连接两个嵌套框以创建人工数据集

1hdlvixo  于 2023-10-19  发布在  其他
关注(0)|答案(1)|浏览(82)
def _simulate_Wks(self, X: pd.DataFrame, K: int, n_iterations: int) -> [list, list]:
    
    cat = X[[3,4,5,6,7]]
    cont = X[[0,1,2,8,9,10,11]]

    Z_prime = np.zeros(X.shape)

    for col in cat:
        Z_prime[:,col] = np.random.randint(low=np.min(cat[:,col]), high=np.max(cat[:,col]), size=cat.shape[0])
    for col in cont:
        Z_prime[:,col] = np.random.uniform(low=np.min(cont[:,col]), high=np.max(cont[:,col]), size=cont.shape[0])

    print(Z_prime)

    simulated_Wks = []     

    for i in range(n_iterations):
        
        sampled_X = Z_prime

        Wks_star = self._calculate_Wks(K=K, X=sampled_X)
        simulated_Wks.append(Wks_star)  

    sim_Wks = np.array(simulated_Wks)
    return sim_Wks

如何生成Z_prime作为人工数据集?Z_prime包含来自分类(连续)特征(我硬编码)的离散(连续)均匀分布的值。

vtwuwzda

vtwuwzda1#

您不需要这些中间数组,而且假设您希望列以与原始列相同的顺序交错,那么它们实际上也不会特别有用。
这将生成一个数组,其中包含您要求的值,我相信:

import numpy as np

X = np.random.randint(0,50,(20,11))
X_prime = X

cat = [1,2,8,9,10]
cont = [0,3,4,5,6,7]

Z_prime = np.zeros(X.shape)
for col in cat:
    Z_prime[:,col] = np.random.randint(low=np.min(X[:,col]), high=np.max(X[:,col]), size=X.shape[0])
for col in cont:
    Z_prime[:,col] = np.random.uniform(low=np.min(X[:,col]), high=np.max(X[:,col]), size=X.shape[0])

print(Z_prime)

输出量:

[[ 4.939638    8.         33.          6.06462632 29.34803376 27.92047673
  10.10907069 45.26071181 37.         34.         40.        ]
 [27.58962248  6.         37.         37.13471058 46.72725805 21.4282369
  39.35005473 16.08329763 40.         38.         10.        ]
 [35.05757352 14.         26.         39.07555817 29.80102443 12.57649644
  35.86069164 47.78790148 12.         11.         40.        ]
 [22.08677496 30.          4.         16.8086066   9.44826919 38.93000123
  35.39630508 20.954704   22.          2.          8.        ]
 [23.73564517 24.         11.         13.44406199 45.65689637 43.90891735
  23.63212734 22.48692891  5.         33.         30.        ]
 [33.70795657 32.         29.         47.6598066  27.65806364 43.07126968
   7.78273694 41.03914817  5.         25.         35.        ]
 [21.93462027 42.         27.         32.97683542 14.82010876 21.18449914
  19.13962883 42.59547862 34.          6.         34.        ]
 [10.75974571 40.         17.         22.13315452 29.06890015  9.12119008
  44.634315   41.13814525 20.         17.         24.        ]
 [ 5.58219042 41.         32.         45.66144365  3.26794523 21.74679317
  43.93397764 20.537597    7.         23.          7.        ]
 [ 2.80288277 13.          9.         43.00648324 25.8418341  42.99789302
  13.6064614  41.55250455 43.         22.         29.        ]
 [19.02350055 38.         34.         36.11615068 16.95271812 29.0093509
   8.77225362  2.25397079 45.          7.         46.        ]
 [ 4.26870036 23.          3.         44.71218094 29.38466236 40.57681454
  15.9364613   2.30036409 30.          4.          9.        ]
 [34.78900513  6.         28.         42.58560791 35.67444186 17.53246451
   9.05995079  3.89850696  4.         22.         42.        ]
 [38.58772118 22.         17.         30.44193758 39.52956127 15.84244933
  29.09113352 22.14571376 42.          4.         28.        ]
 [24.47055993 27.         36.         38.67598599 12.06930315 30.34116524
  19.61116091 16.02624312  9.         22.         12.        ]
 [34.57653156 33.         30.         19.59014599 11.38882781 44.94414761
   7.2274945  21.58703101 41.         18.         28.        ]
 [31.0677403  32.         29.         14.51909232  6.71566188 14.30741017
  43.07708398 13.93551836 32.         38.         37.        ]
 [25.87035468  9.         43.         17.70409586 22.95970004 16.5385711
  38.96364134 31.35512652 39.         11.         29.        ]
 [38.45347072 15.         37.         42.52780401  4.59910299 17.08431454
  21.37944371  5.96820414 15.         43.         12.        ]
 [24.64579903 37.         18.         20.67133629 15.61432415 32.48991493
  31.78827919  3.28247892 42.         25.          5.        ]]

相关问题