如何 Bootstrap Loes函数和估计R的置信区间

lb3vh1jj  于 2023-03-20  发布在  Bootstrap
关注(0)|答案(1)|浏览(117)

我一直在兜圈子,试图为我的数据建立置信区间,我对统计数据只有非常基本的了解,而且在修改here这样的代码时遇到了麻烦。
我的目标是能够预测数据x范围沿着(即从27.05575到144.75700,但如果需要 Bootstrap 过程,可以截断数据)n个值(比如300)的平均值、置信区间和标准差。
生成黄土的示例代码。

# create a data frame
df <- data.frame(
  DBH = c(27.05575, 30.10165, 41.36365, 48.31459, 64.64380, 64.88845, 65.55535, 75.12160, 79.40695, 113.27850, 114.68800, 120.68150, 125.24300, 130.27200, 132.17600, 144.75700),
  length = c(0.0000000, 0.0000000, 0.0000000, 0.0000000, 1.5056656, 0.4686661, 1.5143648, 1.2282208, 0.3701741, 19.2412440, 51.3086010, 33.4588765, 254.6009090, 35.0538617, 59.5713370, 195.1270735),
  normalised = c(0.000000000, 0.000000000, 0.000000000, 0.000000000, 0.005913827, 0.001840787, 0.005947995, 0.004824102, 0.001453939, 0.075574137, 0.201525600, 0.131416956, 1.000000000, 0.137681605, 0.233979278, 0.76640368)
)

model <- loess(normalised ~ DBH, data= df, span = .8)
xrange <- range(subData$DBH)
xseq <- seq(from=xrange[1], to=xrange[2], length=300)
pred <- predict(model, newdata = data.frame(DBH = xseq), se=TRUE)
yfit = pred$fit

predictionDataFrame <- data.frame(xseq, yfit) %>%
  rename(DBH = xseq, normalised = yfit)

ggplot(data = predictionDataFrame, aes(x = DBH, y = normalised)) +
  geom_line(size = 2) +
  geom_point(data = df, aes(x = DBH, y = normalised)) +
  theme_bw()

边注-我喜欢一个不太光滑的曲线,但由于有一些差距,我的数据,我遇到了一些奇怪的,当我使用较低的平滑参数。即这是曲线为0.6:

除了'span'参数之外,还有其他方法来控制loes吗?更改其他参数似乎没有多大作用。但是,使用spatialEco包中的loess.boot函数,拟合曲线看起来比仅具有0.8平滑的原始loess函数更有针对性。最后一张图像是使用spatialEco中的loess.boot函数对地雷的几个不同测量值进行的比较(粗线)和loess函数(虚线)。我不希望依赖于那个包,而是手动完成这个过程,这样我就能理解发生了什么。


预测。

rwqw0loc

rwqw0loc1#

正如Gregor Thomas所评论的那样,你必须把拟合模型和获得预测的代码放在函数中,然后相对直接地使用tidymodels来应用bootstrap重采样来估计不确定性(尽管我不能给予这些不确定性估计在统计上对你试图使用它们的任何推断都是合理的)。
下面是一个示例,我使用您的代码拟合模型,并尽可能逐字地从问题中进行预测,然后将它们转换为函数,然后使用tidymodels方法估计模型,并对10k个 Bootstrap 样本进行预测:

library(dplyr)
library(purrr)
library(tidymodels)

set.seed(2023)

df <- data.frame(
  DBH = c(27.05575, 30.10165, 41.36365, 48.31459, 64.64380, 64.88845, 65.55535, 75.12160, 79.40695, 113.27850, 114.68800, 120.68150, 125.24300, 130.27200, 132.17600, 144.75700),
  length = c(0.0000000, 0.0000000, 0.0000000, 0.0000000, 1.5056656, 0.4686661, 1.5143648, 1.2282208, 0.3701741, 19.2412440, 51.3086010, 33.4588765, 254.6009090, 35.0538617, 59.5713370, 195.1270735),
  normalised = c(0.000000000, 0.000000000, 0.000000000, 0.000000000, 0.005913827, 0.001840787, 0.005947995, 0.004824102, 0.001453939, 0.075574137, 0.201525600, 0.131416956, 1.000000000, 0.137681605, 0.233979278, 0.76640368)
)

fit_loess_on_bootstrap <- function(split) {
  loess(normalised ~ DBH, data = analysis(split), span = .8)
}

extract_prediction <- function(model, xrange) {
  xseq <- seq(from = xrange[1], to = xrange[2], length = 300)
  pred <- predict(model, newdata = data.frame(DBH = xseq), se = TRUE)
  tibble(term = xseq, estimate = pred$fit)
}

boots <-
  df %>%
  bootstraps(10000) %>%
  mutate(
    model = map(splits, fit_loess_on_bootstrap),
    preds = map(model, extract_prediction, xrange = range(df$DBH)),
    spline = map(model, augment)
  )

## Look at a sample of individual fitted loess curves:
boots %>%
  sample_n(100) %>%
  unnest(cols = c(spline)) %>%
  ggplot(aes(DBH, normalised)) +
  geom_line(aes(DBH, .fitted, group = id), alpha = .2) +
  geom_point(data = df) +
  theme_bw()

## Estimate CI using the percentile method:
results <-
  boots %>%
  int_pctl(preds)

results %>%
  ggplot(aes(term, .estimate, ymin = .lower, ymax = .upper)) +
  geom_ribbon(fill = "grey75") +
  geom_line() +
  labs(x = "DBH", y = "normalised") +
  theme_bw()

相关问题