我正在绘制一个时间序列数据的回归线方程:但是它看起来不正确。会是什么问题呢?
验证码:
p1<-ggplot(df2,aes(x=date_his,y=his))+geom_line(lwd=0.8)+
scale_x_date(date_labels = "%Y",expand = c(0, 0))+
theme(plot.title = element_text(size = 14, hjust = 0.5))+
geom_smooth(method = "lm", se = FALSE, linetype = "dashed", color = "blue", size = 1.2)+
ggpubr::stat_regline_equation(label.x = -Inf, label.y = Inf, vjust = 1.5, hjust = -0.1, size = 5,color = "blue",
formula = y ~ poly(x, 1),
show.legend = FALSE
)
回归分析总结:
summary(lm(his ~ date_his,data = df2))
Call:
lm(formula = his ~ date_his, data = df2)
Residuals:
Min 1Q Median 3Q Max
-9.5030 -3.7714 0.0661 4.0350 8.2860
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.682e+02 2.246e+00 74.889 < 2e-16 ***
date_his -6.247e-04 2.138e-04 -2.922 0.00614 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.867 on 34 degrees of freedom
Multiple R-squared: 0.2007, Adjusted R-squared: 0.1772
F-statistic: 8.538 on 1 and 34 DF, p-value: 0.006141
df2 <- data.frame(
date_his = as.Date(c(
"1979-04-28", "1980-04-28", "1981-04-28", "1982-04-28", "1983-04-28",
"1984-04-28", "1985-04-28", "1986-04-28", "1987-04-28", "1988-04-28",
"1989-04-28", "1990-04-28", "1991-04-28", "1992-04-28", "1993-04-28",
"1994-04-28", "1995-04-28", "1996-04-28", "1997-04-28", "1998-04-28",
"1999-04-28", "2000-04-28", "2001-04-28", "2002-04-28", "2003-04-28",
"2004-04-28", "2005-04-28", "2006-04-28", "2007-04-28", "2008-04-28",
"2009-04-28", "2010-04-28", "2011-04-28", "2012-04-28", "2013-04-28",
"2014-04-28"
)),
his = c(
166.008140637138, 169.867917428802, 157.525649715296, 172.567833065154,
170.267019131607, 168.725866057929, 166.718135941998, 158.34217326036,
157.493444169524, 164.212162698115, 161.140482761292, 161.851683272819,
158.688162091076, 159.249075294438, 153.373329948267, 170.934314928049,
164.557361076648, 169.910429608586, 163.399094199897, 161.238272986288,
166.19105244493, 162.451935740926, 157.307524920014, 164.886371717477,
153.843986514936, 157.656023562196, 164.433730558708, 165.485611515611,
157.622007235028, 165.006218026182, 161.901572985301, 152.963472898683,
154.013037508575, 156.507409368617, 157.244735400283, 161.214916194711
)
)
2条答案
按热度按时间qybjjes11#
我认为这里有几个问题。一个是在
stat_regline_equation
中,公式:生成等式:
而下式:
这是默认值,生成以下等式:
第二个公式与
lm()
的输出一致,考虑到stat_regline_equation
四舍五入到2位有效数字。所以
poly(x, 1)
和y ~ x
不一样。poly()
计算的东西称为正交多项式。如果你想要和y ~ x
一样的结果,你需要指定raw = TRUE
:另一个问题是:什么x轴值对应于零,截距相交的地方?在本例中,日期原点为“1970-01-01”。因此,您可能需要考虑在此模型中,一直外推到1970年是否有意义,或者是否应该将日期值转换为表示自其他开始日期以来的天数的数字。
9gm1akwq2#
在我的例子中,数据是以每月值为单位,格式为“X%Y”。%m”