numpy直方图累积密度之和不等于1

3htmauhk 于 12个月前发布在其他

关注(0)|答案(4)|浏览(120)

从另一个线程（@EnricoGiampieri对cumulative distribution plots python的回答）中得到提示，我写道：

# plot cumulative density function of nearest nbr distances
# evaluate the histogram
values, base = np.histogram(nearest, bins=20, density=1)
#evaluate the cumulative
cumulative = np.cumsum(values)
# plot the cumulative function
plt.plot(base[:-1], cumulative, label='data')

字符串
我从np.histogram的文档中输入了density=1，它说：
请注意，直方图值的总和将不等于1，除非选择单位宽度的箱;它不是概率质量函数。
好吧，事实上，当绘制时，它们的总和不等于1。但是，我不理解“单位宽度的箱”。当我将箱设置为1时，当然，我得到一个空的图表;当我将它们设置为人口大小时，我没有得到1的总和（更像是0.2）。当我使用建议的40个箱时，它们的总和约为0.006。
有没有人给予指点？谢谢！

numpy

来源：https://stackoverflow.com/questions/21532667/numpy-histogram-cumulative-density-does-not-sum-to-1

4条答案

按热度按时间

9bfwbjaz1#

你可以简单地规范化你的values变量如下：
unity_values = values / values.sum()个
一个完整的例子看起来像这样：

import numpy as np
import matplotlib.pyplot as plt

x = np.random.normal(size=37)
density, bins = np.histogram(x, normed=True, density=True)
unity_density = density / density.sum()

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, sharex=True, figsize=(8,4))
widths = bins[:-1] - bins[1:]
ax1.bar(bins[1:], density, width=widths)
ax2.bar(bins[1:], density.cumsum(), width=widths)

ax3.bar(bins[1:], unity_density, width=widths)
ax4.bar(bins[1:], unity_density.cumsum(), width=widths)

ax1.set_ylabel('Not normalized')
ax3.set_ylabel('Normalized')
ax3.set_xlabel('PDFs')
ax4.set_xlabel('CDFs')
fig.tight_layout()

字符串

的数据

赞(0）回复(0）举报 12个月前

mfuanj7w2#

你需要确保你的箱子都是宽度1。也就是说：

np.all(np.diff(base)==1)

字符串
要实现这一点，您必须手动指定您的bin：

bins = np.arange(np.floor(nearest.min()),np.ceil(nearest.max()))
values, base = np.histogram(nearest, bins=bins, density=1)

型
你会得到：

In [18]: np.all(np.diff(base)==1)
Out[18]: True

In [19]: np.sum(values)
Out[19]: 0.99999999999999989

型

赞(0）回复(0）举报 12个月前

xienkqul3#

事实上，声明
“请注意，直方图值的总和将不等于1，除非选择单位宽度的箱;它不是概率质量函数。“
意味着我们得到的输出是各个bin的概率密度函数，现在由于在PDF中，两个值比如“a”和“b”之间的概率由范围“a”和“b”之间的PDF曲线下的面积表示。因此，为了得到各个bin的概率值，我们必须将该bin的PDF值乘以其bin宽度，然后所获得的概率序列可以直接用于计算累积概率（因为它们现在被归一化）。
注意，新计算的概率之和将给出给予1，这满足总概率之和为1的事实，或者换句话说，我们可以说我们的概率是归一化的。
请参阅下面的代码，这里我使用了不同宽度的bin，有些宽度为1，有些宽度为2，

import numpy as np
import math
rng = np.random.RandomState(10)   # deterministic random data
a = np.hstack((rng.normal(size=1000),
               rng.normal(loc=5, scale=2, size=1000))) # 'a' is our distribution of data
mini=math.floor(min(a))
maxi=math.ceil(max(a))
print(mini)
print(maxi)
ar1=np.arange(mini,maxi/2)
ar2=np.arange(math.ceil(maxi/2),maxi+2,2)
ar=np.hstack((ar1,ar2))
print(ar)  # ar is the array of unequal widths, which is used below to generate the bin_edges
counts, bin_edges = np.histogram(a, bins=ar, 
                             density = True)
print(counts)    # the pdf values of respective bin_edges
print(bin_edges) # the corresponding bin_edges
print(np.sum(counts*np.diff(bin_edges)))  #finding total sum of probabilites, equal to 1
print(np.cumsum(counts*np.diff(bin_edges))) #to get the cummulative sum, see the last value, it is 1.

字符串
现在我认为他们试图提到的原因是，bin的宽度应该是1，可能是因为这样一个事实，如果bin的宽度等于1，那么pdf的值和任何bin的概率都相等，因为如果我们计算bin下的面积，那么我们基本上是将1乘以该bin的相应pdf，其再次等于该PDF值，因此在这种情况下，PDF的值等于各个仓概率的值，并且因此已经被归一化。

赞(0）回复(0）举报 12个月前

8fsztsew4#

我发现了一个更简单的解决方案，它使你的箱子加起来是1。
设置

density = False

字符串
而是使用权重，像这样：

weights=np.ones(len(values)) / len(values)

型
我无法正确解释，但你可以在这里阅读：https://github.com/matplotlib/matplotlib/issues/10398/

赞(0）回复(0）举报 12个月前

我来回答

numpy直方图累积密度之和不等于1

4条答案

相关问题

热门标签

最新问答