python-3.x 在直方图的顶部添加另一个图

bvjxkvbb  于 2023-10-21  发布在  Python
关注(0)|答案(1)|浏览(131)

我使用Plotly Express绘制两个直方图。第一个是画出汽车广告的分布和他们跑了多少英里。第二个是对于与里程相同的箱子,我正在绘制这些汽车价格分布的直方图。现在,对于第二个图,我想在bin的中间添加一个数据点,表示该bin中所有价格的总和除以同一bin中的汽车数量。Histogram 1 with the counts of Ads per Mileage binHistogram 2 with the sum of all the car prices per mileage bins例如:例如,在上面显示的图片中,我想要一个41.81 B/8186 = 596.3万的数据点。PFB下面的代码

# Creating mil_counts and price_counts DataFrames
mil_counts = df.groupby(['mileage']).size().sort_values(ascending=False).reset_index(name='count')

fig = make_subplots(rows=1, cols=2)

# Create a Plotly Express histogram trace for mileage
mileage_histogram_trace = px.histogram(mil_counts, x="mileage", y="count", title="Mileage", nbins=20)

# Add the mileage histogram trace to the first column
fig.add_trace(go.Histogram(histfunc="sum", x=mileage_histogram_trace.data[0]['x'], y=mileage_histogram_trace.data[0]['y'], 
                            name="Mileage", nbinsx=20), row=1, col=1)

# Create a Plotly Express histogram trace for price
price_histogram_trace = px.histogram(df, x="mileage", y="price", title="Price", nbins=20)

# Add the price histogram trace to the second column
fig.add_trace(go.Histogram(histfunc="sum", x=price_histogram_trace.data[0]['x'], y=price_histogram_trace.data[0]['y'], 
                            name="Price", nbinsx=20), row=1, col=2)

# Calculate the average price of cars in each mileage category
mileage_x = mileage_histogram_trace.data[0]['x']
avg_price = [48813640000/8186 , ]

for x_value in mileage_x:
    indices = np.where(price_histogram_trace.data[0]['x'] == x_value)[0]
    if len(indices) > 0:
        avg_price.append(np.mean(price_histogram_trace.data[0]['y'][indices]))
    else:
        avg_price.append(0)

# Add the line trace (superimposed on the bar) with a secondary y-axis
fig.add_trace(go.Scatter(x=mileage_x, y=avg_price, mode='lines', name="Average Price (Line)", yaxis="y2"), row=1, col=2)

# Update the layout if needed
fig.update_layout(
    title_text="Mileage and Price Histograms",
    xaxis=dict(title="Mileage", domain=[0, 0.4]),
    yaxis=dict(title="Sum of Counts"),
    xaxis2=dict(title="Mileage", domain=[0.6, 0.9]),
    yaxis2=dict(title="Average Price", side="right"),
    xaxis3=dict(title="Mileage", domain=[0.95, 1.0]),
    yaxis3=dict(title="Average Price (Line)", side="right"),
)

fig.show()

我只想用20个数据点画一条线(来自散点),每个数据点代表hist bin的中间值和该bin中价格的平均值。例如,在上面显示的图片中,我想要一个41.81 B/8186 = 596.3万的数据点。当前代码所做的是添加额外的数据点,因为price_histogram_trace.data[0]'x']已经有70 k个数据点,而mileage_histogram_trace.data[0]'x']有7 k个数据点可供匹配。它是为 Dataframe 中的每个里程观测值添加平均价格

lh80um4z

lh80um4z1#

如果你想在直方图的顶部绘制一条分箱数据的平均线,那么你必须计算每个分箱的平均值。下面是一个示例:

import plotly.graph_objects as go
import numpy as np

m = 200000
x = np.linspace(1, m, 100, dtype=int)
y = np.sin((x+m)/(m/2))*m + m
n_bins = 20
chunk_size = len(y)//n_bins
y_avg = [sum(y[i*chunk_size:(i*chunk_size)+chunk_size])/chunk_size for i in range(n_bins)]

fig = go.Figure(data=[
    go.Histogram(x=x, y=y, nbinsx=n_bins, histfunc='sum', name='histogram'),
    go.Scatter(x=x[::len(x)//n_bins]+x[chunk_size]/2, y=y_avg, name='average of data')
])

fig.show()

但是,您需要考虑的是,与bin中的数据数量相比,每个bin中数据的平均值可能会变得非常小,这取决于您为直方图使用的聚合函数。

相关问题