从循环中的 Dataframe 中提取协方差

qojgxg4l  于 2022-12-05  发布在  其他
关注(0)|答案(1)|浏览(139)
sample_size <- 200                                       
sample_meanvector <- c(3, 4)                                   
sample_covariance_matrix <- matrix(c(2, 1, 1, 2),
                                   ncol = 2)

# create bivariate normal distribution
sample_distribution <- mvrnorm(n = sample_size,
                               mu = sample_meanvector, 
                               Sigma = sample_covariance_matrix)

#Convert the datatype
df_sample_distribution <- as.data.frame(sample_distribution)

df_sample_distribution$Y <- (1 + df_sample_distribution$V1*2 + df_sample_distribution$V2 + rnorm(200,0,1))
colnames(df_sample_distribution)[1] <- "X1"
colnames(df_sample_distribution)[2] <- "X2"

上面的代码是我用来生成二元正态分布向量的代码,下面的代码是对生成的数据进行回归的代码。

Test2 <- lm( Y ~ X1, data = df_sample_distribution)
#to extract only specific coefficients
summary(Test)$coefficients[2,1]

我的问题是,是否有一种方法可以重新生成数据,并对数据运行200次回归,然后将所有输出保存在一个列表中。

for (){
  #generate data
  
  for ()
  {
    #extract coeffiients and insert them in a list 
  }
}

简单地说,
步骤1:创建数据
步骤2:对其运行回归
第3步:获取系数(希望将它们保存在列表中)
我正在寻找的代码,可以循环通过步骤1至3为200次,并保存一切结果。任何想法或灵感是欢迎的。提前感谢大家。

pdsfdshx

pdsfdshx1#

只需像伪代码一样将代码 Package 到for循环中:

library(MASS)

iterations <- 10 # In your example this should be 200

sample_size <- 200                                       
sample_meanvector <- c(3, 4)                                   
sample_covariance_matrix <- matrix(c(2, 1, 1, 2),
                                   ncol = 2)

# create output data.frame
df_output <- data.frame(iteration = integer(0), coeff = double(0))

# loop over data generation and regression
for (i in seq_len(iterations)) {
  sample_distribution <- mvrnorm(n = sample_size,
                                 mu = sample_meanvector, 
                                 Sigma = sample_covariance_matrix)
  
  #Convert the datatype
  df_sample_distribution <- as.data.frame(sample_distribution)
  
  df_sample_distribution$Y <- (1 + df_sample_distribution$V1*2 + df_sample_distribution$V2 + rnorm(200,0,1))
  colnames(df_sample_distribution)[1] <- "X1"
  colnames(df_sample_distribution)[2] <- "X2"
  
  df_output[i, 1] <- i
  df_output[i, 2] <- summary(lm( Y ~ X1, data = df_sample_distribution))$coefficients[2,1]
}

这将返回包含每次迭代的系数的df_output

iteration    coeff
1          1 2.647886
2          2 2.274654
3          3 2.447453
4          4 2.451471
5          5 2.568877
6          6 2.428295
7          7 2.440396
8          8 2.478357
9          9 2.477211
10        10 2.367012

相关问题