IPython笔记本内核在运行Kmeans时死亡

zzwlnbp8 于 2022-10-23 发布在 Python

关注(0)|答案(2)|浏览(220)

我用12个变量对大约400K个观测值进行K-means聚类。最初，当我用Kmeans代码运行单元时，它会在2分钟后弹出一条消息，说内核被中断并将重新启动。然后它需要很长时间，就像内核死掉了一样，代码将不再运行。
所以我尝试了125k个观察值和相同数量的变量。但我还是收到了同样的信息。
这是什么意思？。这是否意味着ipython笔记本无法在125k的观测数据上运行kmeans并杀死内核？。
如何解决这个问题？。这对我来说是非常重要的(
请告知。
我使用的代码：
来自sklearn。从sklearn集群导入KMean。指标导入剪影核心


# Initialize the clusterer with n_clusters value and a random generator

    # seed of 10 for reproducibility.
kmeans=KMeans(n_clusters=2,init='k-means++',n_init=10, max_iter=100)
kmeans.fit(Data_sampled.ix[:,1:])
cluster_labels = kmeans.labels_
    # The silhouette_score gives the average value for all the samples.
    # This gives a perspective into the density and separation of the formed
    # clusters
silhouette_avg = silhouette_score(Data_sampled.ix[:,1:],cluster_labels)

pandas

来源：https://stackoverflow.com/questions/32573948/ipython-notebook-kernel-getting-dead-while-running-kmeans

2条答案

按热度按时间

bsxbgnwa1#

根据一些调查，这可能与iPython Notebook/Jupyter无关。这似乎是sklearn的问题，这可以追溯到numpy问题。请参阅相关的github问题sklearnhere和here，以及底层的numpy问题here。
最终，计算轮廓分数需要计算一个非常大的距离矩阵，而对于大量的行，距离矩阵似乎占用了系统太多的内存。例如，看看我的系统（OSX，8GB内存）在类似计算的两次运行中的内存压力-第一次峰值是用10k记录计算的Silhouette Score，第二次…plateau…用40k记录计算：

小时
根据相关的SO答案here，您的内核进程可能会被OS杀死，因为它占用了太多内存。
最终，这将需要对sklearn和/或numpy的底层代码库进行一些修复。在此期间，您可以尝试一些选项：