R语言 使用插入符号定型模型时行搜索失败

6tqwzwtp  于 2023-02-20  发布在  其他
关注(0)|答案(1)|浏览(152)

我使用插入符号中的train函数来训练一个使用svmRadial内核的SVM,以完成我的二进制分类任务。
当我对数据运行train函数时,我会逐渐得到这些消息
line search fails -2.13865 -0.1759025 1.01927e-05 3.812143e-06 -5.240749e-08 -1.810113e-08 -6.03178e-13line search fails -0.7148131 0.1612894 2.32937e-05 3.518543e-06 -1.821269e-08 -1.37704e-08 -4.726926e-13
代码完成后(编译/运行?),我还收到以下警告:

> warnings()
Warning messages:
1: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
2: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
3: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
4: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
5: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
6: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
7: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
8: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
9: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
10: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
11: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
12: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
13: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
14: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
15: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
16: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
17: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
18: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
19: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
20: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
21: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
22: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
23: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
24: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
25: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
26: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
27: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
28: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  ... :
  There were missing values in resampled performance measures.

正如你所看到的警告上面有提到NA值的一些概率计算,为什么这些计算会失败?
根据@HFBrowning的要求,这里是我正在使用的数据的一个例子。我试图建立一个分类器来预测一个电信细胞是否过冲。过冲(类)。

> head(imbal_training,10)
   Total.Tx.Height Antenna.Tilt Antenna.Gain Ant.Vert.Beamwidth       RTWP Voice.Drops Range Max.Distance Rural Suburban Urban
2            31.25            0         15.9               10.0 -103.55396          12  5.14         6.24     1        0     0
5            31.25            0         18.2                4.4 -104.76192           1  3.88         4.98     1        0     0
7            25.14            4         15.9                9.6 -102.93839           1  6.58         9.17     1        0     0
9            25.14            2         18.8                4.3 -104.23198           4  5.08         7.67     1        0     0
11           10.66            4         16.2               10.0  -98.23691          17 23.33        24.69     0        1     0
12           10.66            6         16.2               10.0 -103.78522           5 18.24        19.60     0        1     0
13           10.66            5         16.2               10.0  -94.59940           5 20.20        21.56     0        1     0
14           10.66            3         18.7                4.4 -103.17622           3 23.86        25.22     0        1     0
15           10.66            5         18.7                4.4 -104.97827           0 23.86        25.22     0        1     0
16           10.66            4         18.8                4.4 -105.78948           1 23.86        25.22     0        1     0
              Class HSUPA.Throughput Max.HSDPA.Users HS.DSCH.throughput Max.HSUPA.Users Avg.CQI
2  Not.Overshooting           222.62              16            2345.54              25   17.99
5      Overshooting           263.83               8            3894.07              13   21.82
7      Overshooting           392.66              14            5134.80              15   23.00
9      Overshooting           478.58               8            7203.39               8   24.70
11     Overshooting           173.21              11            2429.06              15   23.51
12     Overshooting           210.61              16            2694.93              20   19.76
13     Overshooting           205.81              11            3278.06              13   22.10
14     Overshooting           394.10              10            3881.88              13   25.01
15     Overshooting           371.71              10            3765.10              13   23.33
16     Overshooting           321.32               6            4422.15               8   24.85

下面是我的列车控制代码:

#run the algorithms using 10 fold cross validation
set.seed(123)
train_Control <- trainControl(method = "repeatedCV", 
                              number = 10, 
                              repeats = 3,
                              savePredictions = T,
                              classProbs = T, #required for the ROC curve calcs
                              summaryFunction = twoClassSummary) #uses AUC to pick the best model

这是我的火车函数

#uses the rose_training dataset with a kernel model
set.seed(123)
fit.rose.Kernel <- train(Class ~ Total.Tx.Height +
                         Antenna.Tilt +
                         Antenna.Gain +
                         Ant.Vert.Beamwidth +
                         RTWP +
                         Voice.Drops +
                         Range +
                         Max.Distance +
                         Rural +
                         Suburban +
                         Urban +
                         HSUPA.Throughput +
                         Max.HSDPA.Users +
                         HS.DSCH.throughput + 
                         Max.HSUPA.Users +
                         Avg.CQI, 
                       data = rose_train,
                       method = 'svmRadial',
                       preProcess = c('center','scale'),
                       trControl=train_Control,
                       tuneLength=15,
                       metric = "ROC")

为了更好地理解代码的哪一部分导致了问题,我清除了所有现有的警告,并逐段运行每个模型,看看它在哪里标记。
最初我将第444到469行标记为有问题的部分,但今天这部分运行时没有任何警告。现在接下来的几行显示与前一天相同的警告,但除了清除警告外,没有任何变化。
总之,我有两种类型的模型,我试图比较,线性SVM使用svmLinear和内核模型使用smvRadial。
对于这两个模型,我使用了不同的训练数据配置,因为我的原始数据集严重不平衡到“过冲”(~80/20)。我使用原始不平衡数据,然后我下采样,上采样,使用SMOTE和ROSE生成合成数据,使用每种类型的训练集训练线性和内核模型。
有人知道这些线路搜索失败和警告指的是什么吗?
为了提供一个可重复的示例,here是指向我的代码副本的链接,here是我正在使用的数据集的dput版本,导致这些消息和警告的代码部分从第444行开始。
如果有人能提供一些帮助,我将不胜感激。

gtlvzcf8

gtlvzcf81#

我无法访问您的数据,但一些建议:
1.检查数据中是否有NA。如果有,可以使用na.omit()删除带有NA的行。
1.使用createDataPartition()将原始的不平衡数据划分为最佳的训练和测试集。
注意:为避免人为错误,您的序列函数可按如下方式进行清理-

fit.rose.Kernel <- train(Class ~ ., 
                       data = rose_train,
                       method = 'svmRadial',
                       preProcess = c('center','scale'),
                       trControl=train_Control,
                       tuneLength=15,
                       metric = "ROC")

这可能也有助于解决该问题。

相关问题