R语言 为什么回归线方程不正确?

9o685dep  于 2023-05-04  发布在  其他
关注(0)|答案(2)|浏览(191)

我正在绘制一个时间序列数据的回归线方程:但是它看起来不正确。会是什么问题呢?
验证码:

p1<-ggplot(df2,aes(x=date_his,y=his))+geom_line(lwd=0.8)+
  scale_x_date(date_labels = "%Y",expand = c(0, 0))+
  theme(plot.title = element_text(size = 14, hjust = 0.5))+
  geom_smooth(method = "lm", se = FALSE, linetype = "dashed", color = "blue", size = 1.2)+
  ggpubr::stat_regline_equation(label.x = -Inf, label.y = Inf, vjust = 1.5, hjust = -0.1, size = 5,color = "blue",
                                formula = y ~ poly(x, 1),
                                show.legend = FALSE
  )

回归分析总结:
summary(lm(his ~ date_his,data = df2))

Call:
lm(formula = his ~ date_his, data = df2)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.5030 -3.7714  0.0661  4.0350  8.2860 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.682e+02  2.246e+00  74.889  < 2e-16 ***
date_his    -6.247e-04  2.138e-04  -2.922  0.00614 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.867 on 34 degrees of freedom
Multiple R-squared:  0.2007,    Adjusted R-squared:  0.1772 
F-statistic: 8.538 on 1 and 34 DF,  p-value: 0.006141
df2 <- data.frame(
  date_his = as.Date(c(
    "1979-04-28", "1980-04-28", "1981-04-28", "1982-04-28", "1983-04-28",
    "1984-04-28", "1985-04-28", "1986-04-28", "1987-04-28", "1988-04-28",
    "1989-04-28", "1990-04-28", "1991-04-28", "1992-04-28", "1993-04-28",
    "1994-04-28", "1995-04-28", "1996-04-28", "1997-04-28", "1998-04-28",
    "1999-04-28", "2000-04-28", "2001-04-28", "2002-04-28", "2003-04-28",
    "2004-04-28", "2005-04-28", "2006-04-28", "2007-04-28", "2008-04-28",
    "2009-04-28", "2010-04-28", "2011-04-28", "2012-04-28", "2013-04-28",
    "2014-04-28"
  )),
  his = c(
    166.008140637138, 169.867917428802, 157.525649715296, 172.567833065154,
    170.267019131607, 168.725866057929, 166.718135941998, 158.34217326036,
    157.493444169524, 164.212162698115, 161.140482761292, 161.851683272819,
    158.688162091076, 159.249075294438, 153.373329948267, 170.934314928049,
    164.557361076648, 169.910429608586, 163.399094199897, 161.238272986288,
    166.19105244493, 162.451935740926, 157.307524920014, 164.886371717477,
    153.843986514936, 157.656023562196, 164.433730558708, 165.485611515611,
    157.622007235028, 165.006218026182, 161.901572985301, 152.963472898683,
    154.013037508575, 156.507409368617, 157.244735400283, 161.214916194711
  )
)
qybjjes1

qybjjes11#

我认为这里有几个问题。一个是在stat_regline_equation中,公式:

y ~ poly(x, 1)

生成等式:

y = 160 - 14x

而下式:

y ~ x

这是默认值,生成以下等式:

y = 170 - 0.00062x

第二个公式与lm()的输出一致,考虑到stat_regline_equation四舍五入到2位有效数字。
所以poly(x, 1)y ~ x不一样。poly()计算的东西称为正交多项式。如果你想要和y ~ x一样的结果,你需要指定raw = TRUE

coef(lm(his ~ poly(date_his, 1, raw = TRUE), data = df2))
                 (Intercept) poly(date_his, 1, raw = TRUE) 
                1.681976e+02                 -6.247098e-04

另一个问题是:什么x轴值对应于零,截距相交的地方?在本例中,日期原点为“1970-01-01”。因此,您可能需要考虑在此模型中,一直外推到1970年是否有意义,或者是否应该将日期值转换为表示自其他开始日期以来的天数的数字。

9gm1akwq

9gm1akwq2#

在我的例子中,数据是以每月值为单位,格式为“X%Y”。%m”

df2$date_his<-as.Date(sub("X", "", df2$date_his), format = "%Y")
    Date <- as.POSIXct(df2$date_his, format = "%Y-%m-%d")
    df2$date_his<-format(Date, format="%Y")
    df2$number<-c(1:36)
    p1<-ggplot(df2,aes(x=date_his,y=his))+geom_line(lwd=0.8)+
      geom_smooth(method = "lm", se = FALSE, color = "blue", linetype = "dashed") +
      scale_x_continuous(date_labels = "%Y", expand = c(0, 0))+
      theme(plot.title = element_text(size = 18, hjust = 0.5),
            axis.text = element_text(size = 14),
            axis.title = element_text(size = 14),
            axis.ticks = element_line(size = 1))+
      ylim(150,190)
    
      fit1 <- lm(his ~ number, data = df2)
    eqn1 <- sprintf("y = %.2 f + %.2f x", coef(fit1)[1], coef(fit1)[2])
p1<-p1+geom_text(aes(x = -Inf, y = Inf,label = eqn1),vjust = 1.5, hjust = -0.1, size = 7, color = "blue")

相关问题