R Ranger库的交叉验证

snz8szmq  于 2023-04-27  发布在  其他
关注(0)|答案(1)|浏览(109)

你好,我有以下游侠模型:

X <- train_df[, -1]
y <- train_df$Price

rf_model <- ranger(Price ~ ., data = train_df, mtry = 11 ,splitrule = "extratrees" ,min.node.size = 1, num.trees =100)

我想完成两件事
1.给予我一个平均性能指标,跨非交叉方差数据集进行交叉验证,并给我一个更稳定的准确性指标,尽管种子值发生了变化
1.设置交叉验证以找到最优的mtry和num.trees组合。
我所尝试的:

下面的方法可以优化mtry,splitrule和min.node.size,但是我不能在等式中加入树的数量,因为这样做会出错。#定义参数grid来搜索param_grid〈- expand.grid(mtry = c(1:ncol(X)),splitrule = c(“variance”,“extratrees”,“maxstat”),min.node.size = c(1,5,10))

# set up the cross-validation scheme
cv_scheme <- trainControl(method = "cv",
                          number = 5,
                          verboseIter = TRUE)

# perform the grid search using caret
rf_model <- train(x = X,
                  y = y,
                  method = "ranger",
                  trControl = cv_scheme,
                  tuneGrid = param_grid)

# view the best parameter values
rf_model$bestTune
jm81lzqq

jm81lzqq1#

一个简单的方法是在train中添加一个num.trees参数并迭代该参数。
另一种方法是创建自定义模型,请参阅本章Using Your Own Model
Pham Dinh Khanh在RPubs上发表了一篇论文,证明了here

library(caret)
library(mlbench)
library(ranger)
data(PimaIndiansDiabetes)
x=PimaIndiansDiabetes[,-ncol(PimaIndiansDiabetes)]
y=PimaIndiansDiabetes[,ncol(PimaIndiansDiabetes)]

param_grid=expand.grid(mtry = c(1:4),
                       splitrule = c( "variance", "extratrees"),
                       min.node.size = c(1, 5))
cv_scheme <- trainControl(method = "cv",
                          number = 5,
                          verboseIter = FALSE)
models=list()
for (ntree in c(4,100)){
set.seed(123)
rf_model <- train(x = x,
                  y = y,
                  method = "ranger",
                  trControl = cv_scheme,
                  tuneGrid = param_grid,
                  num.trees=ntree)
name=paste0(ntree,"_tr_model")
models[[name]]=rf_model
}

models[["4_tr_model"]]
#> Random Forest 
#> 
#> 768 samples
#>   8 predictor
#>   2 classes: 'neg', 'pos' 
#> 
#> No pre-processing
#> Resampling: Cross-Validated (5 fold) 
#> Summary of sample sizes: 614, 615, 614, 615, 614 
#> Resampling results across tuning parameters:
#> 
#>   mtry  splitrule   min.node.size  Accuracy   Kappa    
#>   1     variance    1                    NaN        NaN
#>   1     variance    5                    NaN        NaN
#>   1     extratrees  1              0.6808675  0.2662428
#>   1     extratrees  5              0.6783125  0.2618862
...

models[["100_tr_model"]]
#> Random Forest 
...
#> 
#>   mtry  splitrule   min.node.size  Accuracy   Kappa    
#>   1     variance    1                    NaN        NaN
#>   1     variance    5                    NaN        NaN
#>   1     extratrees  1              0.7473559  0.3881530
#>   1     extratrees  5              0.7564808  0.4112127
...

创建于2023-04-19带有reprex v2.0.2

相关问题