我想找出那些把双峰分布分成3组的X值。例如,在我下面的代码中,基于双峰图,近似值是x小于5,(x大于5且小于90)和(x大于90)
但我没有得到这些值。这是我的代码
import numpy as np
from scipy.stats import gaussian_kde
from scipy.signal import find_peaks
# Generate bimodal data
dist1 = np.random.normal(loc=0, scale=1, size=100)
dist2 = np.random.normal(loc=90, scale=1, size=100)
bimodal = np.concatenate((dist1, dist2))
# Fit KDE
kde = gaussian_kde(bimodal)
xgrid = np.linspace(0,100)
pdf = kde.evaluate(xgrid)
plt.hist(bimodal, bins=100, density=True)
plt.title("Bimodal distribution")
# Find peaks
peaks, _ = find_peaks(pdf)
peak1, peak2 = xgrid[peaks]
# Find valley
pdf_min = np.min(pdf[(xgrid > peak1) & (xgrid < peak2)])
valley = xgrid[(pdf == pdf_min) & (xgrid > peak1) & (xgrid < peak2)]
# Create group labels
groups = np.ones(len(bimodal), dtype=int)
groups[bimodal < valley] = 0
groups[(bimodal >= valley) & (bimodal <= peak1)] = 1
groups[bimodal > peak1] = 2
# Plot
plt.hist(bimodal)
plt.vlines([valley, peak1], 0, 100, colors='r')
plt.title("Bimodal distribution clustered into 3 groups")
plt.show()
print(groups)
1条答案
按热度按时间aamkag611#
这基本上是一个Gaussian mixture model。有很多复杂的方法来适应它;我展示了一个非常简单的例子。如果你的数据类似于你写的示例参数,那么它将工作得很好。