我使用Plotly Express绘制两个直方图。第一个是画出汽车广告的分布和他们跑了多少英里。第二个是对于与里程相同的箱子,我正在绘制这些汽车价格分布的直方图。现在,对于第二个图,我想在bin的中间添加一个数据点,表示该bin中所有价格的总和除以同一bin中的汽车数量。Histogram 1 with the counts of Ads per Mileage bin和Histogram 2 with the sum of all the car prices per mileage bins例如:例如,在上面显示的图片中,我想要一个41.81 B/8186 = 596.3万的数据点。PFB下面的代码
# Creating mil_counts and price_counts DataFrames
mil_counts = df.groupby(['mileage']).size().sort_values(ascending=False).reset_index(name='count')
fig = make_subplots(rows=1, cols=2)
# Create a Plotly Express histogram trace for mileage
mileage_histogram_trace = px.histogram(mil_counts, x="mileage", y="count", title="Mileage", nbins=20)
# Add the mileage histogram trace to the first column
fig.add_trace(go.Histogram(histfunc="sum", x=mileage_histogram_trace.data[0]['x'], y=mileage_histogram_trace.data[0]['y'],
name="Mileage", nbinsx=20), row=1, col=1)
# Create a Plotly Express histogram trace for price
price_histogram_trace = px.histogram(df, x="mileage", y="price", title="Price", nbins=20)
# Add the price histogram trace to the second column
fig.add_trace(go.Histogram(histfunc="sum", x=price_histogram_trace.data[0]['x'], y=price_histogram_trace.data[0]['y'],
name="Price", nbinsx=20), row=1, col=2)
# Calculate the average price of cars in each mileage category
mileage_x = mileage_histogram_trace.data[0]['x']
avg_price = [48813640000/8186 , ]
for x_value in mileage_x:
indices = np.where(price_histogram_trace.data[0]['x'] == x_value)[0]
if len(indices) > 0:
avg_price.append(np.mean(price_histogram_trace.data[0]['y'][indices]))
else:
avg_price.append(0)
# Add the line trace (superimposed on the bar) with a secondary y-axis
fig.add_trace(go.Scatter(x=mileage_x, y=avg_price, mode='lines', name="Average Price (Line)", yaxis="y2"), row=1, col=2)
# Update the layout if needed
fig.update_layout(
title_text="Mileage and Price Histograms",
xaxis=dict(title="Mileage", domain=[0, 0.4]),
yaxis=dict(title="Sum of Counts"),
xaxis2=dict(title="Mileage", domain=[0.6, 0.9]),
yaxis2=dict(title="Average Price", side="right"),
xaxis3=dict(title="Mileage", domain=[0.95, 1.0]),
yaxis3=dict(title="Average Price (Line)", side="right"),
)
fig.show()
我只想用20个数据点画一条线(来自散点),每个数据点代表hist bin的中间值和该bin中价格的平均值。例如,在上面显示的图片中,我想要一个41.81 B/8186 = 596.3万的数据点。当前代码所做的是添加额外的数据点,因为price_histogram_trace.data[0]'x']已经有70 k个数据点,而mileage_histogram_trace.data[0]'x']有7 k个数据点可供匹配。它是为 Dataframe 中的每个里程观测值添加平均价格
1条答案
按热度按时间lh80um4z1#
如果你想在直方图的顶部绘制一条分箱数据的平均线,那么你必须计算每个分箱的平均值。下面是一个示例:
但是,您需要考虑的是,与bin中的数据数量相比,每个bin中数据的平均值可能会变得非常小,这取决于您为直方图使用的聚合函数。