我有变量为industry_sales
、year
和industry
的数据。
我希望在5年的滚动窗口中对每个行业进行时间序列回归,因变量为industry_sales
,自变量为year
。例如,如果数据范围为2000-2010年,则每个回归和行业组的滚动窗口将为2000-2004、2001-2005,依此类推。
我尝试了一次,但代码失败:
library(tidyverse)
library(zoo)
# Group the data by industry
data_grouped <- data %>%
group_by(industry)
glimpse(data_grouped)
# Define a function to run a regression
regression_func <- function(x) {
lm(industry_sales ~ year, data = x)
}
# Perform rolling linear regression on each group, with a window of 5 years
results_list <- data_grouped %>%
do({
rollapply(data_grouped$year, width = 5, FUN = regression_func, by = 1)
})
dput()输出的数据如下所示,任何帮助将不胜感激:
structure(list(year = c(2010, 2011, 2012, 2013, 2014, 2015, 2016,
2017, 2018, 2019, 2020, 2021, 2022, 2010, 2011, 2012, 2013, 2014,
2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2010, 2011, 2012,
2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2010,
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021,
2022, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019,
2020, 2021, 2022, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2018, 2019, 2020, 2021, 2022, 2010, 2011, 2012, 2013, 2014, 2015,
2016, 2017, 2018, 2019, 2020, 2021, 2022, 2010, 2011, 2012, 2013,
2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022), industry_sales = c(853.218,
1001.856, 987.741, 990.679, 1061.804, 985.904, 1073.43, 1305.779,
1314.722, 1188.122, 1327.245, 1711.143, 577.358, 25731.969, 26484.263,
23085.259, 24842.509, 30563.672, 34947.361, 40066.425, 38293.972,
40484.096, 37160.342, 29109.755, 35227.521, 16522.915, 1072310.255,
1314543.982, 1288932.701, 1245467.184, 1169743.543, 882325.562,
804108.99, 966832.252, 1099655.803, 999685.99, 725094.688, 797072.967,
59906.738, 131274.107, 167635.266, 170003.001, 161892.01, 149722.289,
108169.702, 92950.89, 120837.727, 132130.143, 121670.84, 118206.596,
147732.623, 542.17, 201842.321, 209721.793, 215722.823, 223158.977,
221191.213, 223076.061, 229199.539, 242995.929, 256145.995, 257282.085,
234455.443, 214560.191, 13964.787, 88190.421, 90191.422, 80848.451,
93999.434, 92325.688, 99138.466, 98682.173, 95106.591, 97622.097,
94254.621, 81145.338, 87421.329, 12116.901, 247825.263, 262355,
235423.37, 236697.987, 245287.766, 222221.931, 219657.82, 222990.939,
224672.689, 162633.722, 139853.429, 148534.588, 4463.853, 28177.129,
29298.618, 29251.077, 31696.968, 31636.033, 33064.57, 35745.408,
32034.079, 31978.646, 31661.88, 28572.857, 34190.976, 3400.497
), industry = c("Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Construction", "Construction",
"Construction", "Construction", "Construction", "Construction",
"Construction", "Construction", "Construction", "Construction",
"Construction", "Construction", "Construction", "Manufacturing",
"Manufacturing", "Manufacturing", "Manufacturing", "Manufacturing",
"Manufacturing", "Manufacturing", "Manufacturing", "Manufacturing",
"Manufacturing", "Manufacturing", "Manufacturing", "Manufacturing",
"Mining", "Mining", "Mining", "Mining", "Mining", "Mining", "Mining",
"Mining", "Mining", "Mining", "Mining", "Mining", "Mining", "Retail Trade",
"Retail Trade", "Retail Trade", "Retail Trade", "Retail Trade",
"Retail Trade", "Retail Trade", "Retail Trade", "Retail Trade",
"Retail Trade", "Retail Trade", "Retail Trade", "Retail Trade",
"Services", "Services", "Services", "Services", "Services", "Services",
"Services", "Services", "Services", "Services", "Services", "Services",
"Services", "Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Wholesale Trade", "Wholesale Trade", "Wholesale Trade", "Wholesale Trade",
"Wholesale Trade", "Wholesale Trade", "Wholesale Trade", "Wholesale Trade",
"Wholesale Trade", "Wholesale Trade", "Wholesale Trade", "Wholesale Trade",
"Wholesale Trade")), row.names = c(NA, -104L), class = c("tbl_df",
"tbl", "data.frame"))
2条答案
按热度按时间a2mppw5e1#
存在以下几个问题:
**1)**我们假设对于每个回归,我们需要行业、窗口的最后一年、截距、斜率和R^2的一行。修改c(...)以更改所需的统计量。例如,如果需要大量统计量,则为
c(year = tail(yr, 1), unlist(broom::glance(fm)))
。这将按
industry
分组,然后使用group_modify
为每个行业创建行。给出:
**2)**如果你只需要一个统计量--比如斜率--那么用一个二维动物园序列,每个行业一列,每个年份一行,可能会更方便。注意,斜率是平移不变的,所以我们可以只使用1:5作为每种情况下的预测因子,我们将得到与使用年份相同的斜率。
给出:
如果您希望将其作为 Dataframe ,请使用
fortify.zoo(out2)
或长格式fortify.zoo(out2, melt = TRUE)
。pxy2qtax2#
这里有一个使用
slider::slide2()
而不是zoo::rollapply()
的解决方案,它将生成一个命名的嵌套列表,其中包含嵌套在industries中的5年期结果。注意,这里假设每年有一行没有缺失或重复的年份,如果不是这样,
slide2_index()
或slide2_period()
可能更有用;参见introductory vignette。