用RangerDecisionTree提取估计值

balp4ylt 于 2023-01-06 发布在其他

关注(0)|答案(1)|浏览(114)

当我试图提取用R中的ranger包构建的回归模型的估计值时，我得到了错误消息Error: No tidy method for objects of class ranger。
下面是我的代码：

# libraries
library(tidymodels)
library(textrecipes)
library(LiblineaR)
library(ranger)
library(tidytext)

# create the recipe
comments.rec <- recipe(year ~ comments, data = oa.comments) %>%
  step_tokenize(comments, token = "ngrams", options = list(n = 2, n_min = 1)) %>%
  step_tokenfilter(comments, max_tokens = 1e3) %>%
  step_stopwords(comments, stopword_source = "stopwords-iso") %>%
  step_tfidf(comments) %>%
  step_normalize(all_predictors())

# workflow with recipe
comments.wf <- workflow() %>%
  add_recipe(comments.rec)

# create the regression model using support vector machine
svm.spec <- svm_linear() %>%
  set_engine("LiblineaR") %>%
  set_mode("regression")

svm.fit <- comments.wf %>%
  add_model(svm.spec) %>%
  fit(data = oa.comments)

# extract the estimates for the support vector machine model
svm.fit %>%
  pull_workflow_fit() %>%
  tidy() %>%
  arrange(-estimate)

下面是数据集中每个标记化术语的估计值表（这是用于演示目的的脏数据集）

term                     estimate
   <chr>                       <dbl>
 1 Bias                     2015.   
 2 tfidf_comments_2021         0.877
 3 tfidf_comments_2019         0.851
 4 tfidf_comments_2020         0.712
 5 tfidf_comments_2018         0.641
 6 tfidf_comments_https        0.596
 7 tfidf_comments_plan s       0.462
 8 tfidf_comments_plan         0.417
 9 tfidf_comments_2017         0.399
10 tfidf_comments_libraries    0.286

但是，当使用ranger引擎从随机森林创建回归模型时，我就没有这样的运气了，并得到了上面的错误消息

# create the regression model using random forests
rf.spec <- rand_forest(trees = 50) %>%
  set_engine("ranger") %>%
  set_mode("regression")

rf.fit <- comments.wf %>%
  add_model(rf.spec) %>%
  fit(data = oa.comments)

# extract the estimates for the random forests model
rf.fit %>%
  pull_workflow_fit() %>%
  tidy() %>%
  arrange(-estimate)

r

来源：https://stackoverflow.com/questions/75024195/extracting-estimates-with-ranger-decision-trees

1条答案

按热度按时间

vngu2lb81#

以一种更简单的形式把它还给你们，我认为这突出了一个问题--如果你有一个决策树模型，你将如何对数据集中的数据产生系数？这意味着什么？
我认为你在这里寻找的是每一列的属性，tidymodels中有一些工具可以做到这一点，但你应该阅读它实际报告的内容。
对于您来说，通过使用vip包，您可以对这些数字有一个基本的了解，尽管产生的数字肯定不能直接与svm的数字进行比较。

install.packages('vip')
library(vip)  

rf.fit %>%
       pull_workflow_fit() %>%
       vip(geom = "point") + 
       labs(title = "Random forest variable importance")

您将生成一个具有相对重要性分数的图。

rf.fit %>%
   pull_workflow_fit() %>%
   vi()

tidymodels有一个不错的演练来完成这个here，但是，如果你有一个可以估计重要性分数的模型，你应该很好去做。
Tidymodels tutorial page - 'a case study'
编辑：如果你还没有这样做，你可能需要重新运行你的初始模型与一个新的参数传递在'set_engine'步骤的代码，让护林员的想法是什么样的重要性分数，你正在寻找/他们应该如何计算.

赞(0）回复(0）举报 2023-01-06

我来回答

用RangerDecisionTree提取估计值

1条答案

相关问题

热门标签

最新问答