我想使用recipe
包中的step_impute_knn
函数来填补数据中的一些缺失值。我已经用默认参数(neighbors = 5,nthread = 1和eps = 1 e-08)测试过它,可以看到得到的数值变量(例如)的平均值和标准差非常接近插补后的原始数据。
然而,我想调整这些参数,看看是否有一个最佳的设置,但我甚至不知道如何开始在食谱包。here和here的答案太复杂或太具体,我无法理解。
函数step_impute_knn
没有提供任何调优选项,据我所知,我不想手动进行调优。有没有一个简单的方法来做到这一点?
样本数据:
train <- structure(list(PassengerId = c("0001_01", "0002_01", "0003_01",
"0003_02", "0004_01", "0005_01"), HomePlanet = c("Europa", "Earth",
"Europa", "Europa", "Earth", NA), CryoSleep = c("False",
"False", "False", "False", "False", "False"), Cabin = c("B/0/P",
"F/0/S", "A/0/S", "A/0/S", "F/1/S", "F/0/P"), Destination = c("TRAPPIST-1e",
"TRAPPIST-1e", "TRAPPIST-1e", "TRAPPIST-1e", "TRAPPIST-1e", "PSO J318.5-22"
), Age = c(39, 24, 58, 33, 16, 44), VIP = c("False", "False",
"True", "False", "False", "False"), RoomService = c(0, 109, 43,
0, 303, 0), FoodCourt = c(0, 9, 3576, 1283, 70, 483), ShoppingMall = c(0,
25, 0, 371, 151, 0), Spa = c(0, 549, 6715, 3329, 565, 291), VRDeck = c(0,
44, 49, 193, 2, 0), Name = c("Maham Ofracculy", "Juanna Vines",
"Altark Susent", "Solam Susent", "Willy Santantines", "Sandie Hinetthews"
), Transported = c("False", "True", "False", "False", "True",
"True")), row.names = c(NA, 6L), class = "data.frame")
到目前为止我有:
train_no_na <- train %>%
na.omit()
imp_knn_blueprint <- recipe(Transported ~ ., data = train_no_na) %>%
step_impute_knn(recipe = ., HomePlanet,
impute_with = imp_vars(.), neighbors = 5,
options = list(nthread = 1, eps = 1e-08))
imp_knn_prep <- prep(imp_knn_blueprint, training = train_no_na)
imp_knn_5 <- bake(imp_knn_prep, new_data = train)
1条答案
按热度按时间ut6juiuv1#
是的,可以(尽管我们不考虑
nthread
或eps
调优参数)。您可以在配方中给予它们一个值
tune()
,并将其视为与模型相关的任何其他调优参数。您可以使用
tune_grid()
或其他调优参数函数之一。tidymodels甚至可以理解这个特定的参数是什么,并且有内置的默认范围(尽管您可以自己选择网格)在tidymodels手册和
tune_grid
帮助页面(在示例中)中有一个调整配方参数的示例。