如何在Python中的matplotlib中绘制cdf？

uqjltbpv 于 2022-12-13 发布在 Python

关注(0)|答案(7)|浏览(239)

我有一个名为d的无序列表，如下所示：

[0.0000, 123.9877,0.0000,9870.9876, ...]

我只是想用Python中的Matplotlib根据这个列表绘制一个cdf图。

d = []
d_sorted = []
for line in fd.readlines():
    (addr, videoid, userag, usertp, timeinterval) = line.split()
    d.append(float(timeinterval))

d_sorted = sorted(d)

class discrete_cdf:
    def __init__(data):
        self._data = data # must be sorted
        self._data_len = float(len(data))

    def __call__(point):
        return (len(self._data[:bisect_left(self._data, point)]) / 
               self._data_len)

cdf = discrete_cdf(d_sorted)
xvalues = range(0, max(d_sorted))
yvalues = [cdf(point) for point in xvalues]
plt.plot(xvalues, yvalues)

现在我正在使用这段代码，但错误消息是：

Traceback (most recent call last):
File "hitratioparea_0117.py", line 43, in <module>
cdf = discrete_cdf(d_sorted)
TypeError: __init__() takes exactly 1 argument (2 given)

matplotlib

来源：https://stackoverflow.com/questions/9378420/how-to-plot-cdf-in-matplotlib-in-python

7条答案

按热度按时间

llmtgqce1#

我知道我迟到了。但是，如果你只是想把cdf用于你的情节，而不是未来的计算，有一个更简单的方法：

plt.hist(put_data_here, normed=True, cumulative=True, label='CDF',
         histtype='step', alpha=0.8, color='k')

作为一个例子，

plt.hist(dataset, bins=bins, normed=True, cumulative=True, label='CDF DATA', 
         histtype='step', alpha=0.55, color='purple')
# bins and (lognormal / normal) datasets are pre-defined

编辑：matplotlib文档中的This example可能更有帮助。

赞(0）回复(0）举报 2022-12-13

kd3sttzy2#

如前所述，numpy from numpy可以很好地工作。确保你的数据是一个正确的PDF（即总和为1），否则CDF不会以单位as it should结束。下面是一个最小的工作示例：

import numpy as np
from pylab import *

# Create some test data
dx = 0.01
X  = np.arange(-2, 2, dx)
Y  = np.exp(-X ** 2)

# Normalize the data to a proper PDF
Y /= (dx * Y).sum()

# Compute the CDF
CY = np.cumsum(Y * dx)

# Plot both
plot(X, Y)
plot(X, CY, 'r--')

show()

赞(0）回复(0）举报 2022-12-13

hkmswyz63#

计算累积和cumsum的numpy函数在这里很有用

In [1]: from numpy import cumsum
In [2]: cumsum([.2, .2, .2, .2, .2])
Out[2]: array([ 0.2,  0.4,  0.6,  0.8,  1. ])

赞(0）回复(0）举报 2022-12-13

o75abkj44#

现在，您可以使用seaborn的kdeplot函数，并将cumulative设为True来生成CDF。

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

X1 = np.arange(100)
X2 = (X1 ** 2) / 100
sns.kdeplot(data = X1, cumulative = True, label = "X1")
sns.kdeplot(data = X2, cumulative = True, label = "X2")
plt.legend()
plt.show()

赞(0）回复(0）举报 2022-12-13

csbfibhn5#

对于值的任意集合，x：

def cdf(x, plot=True, *args, **kwargs):
    x, y = sorted(x), np.arange(len(x)) / len(x)
    return plt.plot(x, y, *args, **kwargs) if plot else (x, y)

((If您是python的新手，*args和**kwargs允许您传递参数和命名参数，而无需显式声明和管理它们））

赞(0）回复(0）举报 2022-12-13

bfhwhh0e6#

对我来说效果最好的是Pandas的quantile功能。
假设我有71个参与者，每个参与者都有一定数量的干扰，我想计算参与者的干扰次数CDF图，目标是能够知道有多少百分比的参与者至少有30次干预。

step=0.05
indices = np.arange(0,1+step,step)
num_interruptions_per_participant = [32,70,52,52,39,20,37,31,60,57,31,71,24,23,38,4,77,37,79,43,63,43,75,13
,45,31,57,28,61,29,30,52,65,11,76,37,65,28,33,73,65,43,50,33,45,40,50,44
,33,49,24,69,55,47,22,45,54,11,30,13,32,52,31,50,10,46,10,25,47,51,83]

CDF = pd.DataFrame({'dummy':num_interruptions_per_participant})['dummy'].quantile(indices)

plt.plot(CDF,indices,linewidth=9, label='#interventions', color='blue')

根据图表，几乎25%的参与者的干预少于30次。
您可以使用此统计数据进行进一步分析。例如，在我的案例中，我需要对每个参与者进行至少30次干预，才能满足一个受试者退出评估所需的最低样本要求。CDF告诉我，我对25%的参与者有问题。

赞(0）回复(0）举报 2022-12-13

qc6wkl3g7#

import matplotlib.pyplot as plt
X=sorted(data)
Y=[]
l=len(X)
Y.append(float(1)/l)
for i in range(2,l+1):
    Y.append(float(1)/l+Y[i-2])
plt.plot(X,Y,color=c,marker='o',label='xyz')

我想这样就可以了，有关过程，请参阅http://www.youtube.com/watch?v=vcoCVVs0fRI

赞(0）回复(0）举报 2022-12-13

我来回答

如何在Python中的matplotlib中绘制cdf？

7条答案

相关问题

热门标签

最新问答