R语言 计算类内误差的多类混淆矩阵

fkaflof6  于 2023-04-09  发布在  其他
关注(0)|答案(1)|浏览(153)

我已经尝试使用玩具数据进行k-NN分类,并得到如下预测:

actual <- c(rep('A1',12), rep('A2',12), rep('A3',7), rep('A4',12), rep('B1',11), rep('B2',17), rep('C1',15))
prediction <- c('A1','A1','A1','A1','A1','A3','A4','A4','B1','B2','C1','C1',
                'A2','A2','A2','A2','A2','A3','A4','A4','A4','B1','B1','C1',
                'A1','A2','A3','A3','A3','A3','B2',
                'A1','A1','A2','A2','A2','A4','A4','A4','A4','A4','A4','B1',
                'A1','A2','A2','A4','B1','B1','B1','B2','B2','B2','B2',
                'A1','A3','B1','B1','B1','B1','B2','B2','B2','B2','B2','B2','B2','B2','B2','C1','C1',
                'A1','A1','A2','B2','B2','C1','C1','C1','C1','C1','C1','C1','C1','C1','C1')

关于预测的基本思想可以使用table()来实现:

table(actual, prediction)
#       prediction
# actual A1 A2 A3 A4 B1 B2 C1
#     A1  5  0  1  2  1  1  2
#     A2  0  5  1  3  2  0  1
#     A3  1  1  4  0  0  1  0
#     A4  2  3  0  6  1  0  0
#     B1  1  2  0  1  3  4  0
#     B2  1  0  1  0  4  9  2
#     C1  2  1  0  0  0  2 10

有一个非常有用的函数caret::confusionMatrix()

caret::confusionMatrix(prediction, actual)
# Confusion Matrix and Statistics
# 
# Reference
# Prediction A1 A2 A3 A4 B1 B2 C1
# A1  5  0  1  2  1  1  2
# A2  0  5  1  3  2  0  1
# A3  1  1  4  0  0  1  0
# A4  2  3  0  6  1  0  0
# B1  1  2  0  1  3  4  0
# B2  1  0  1  0  4  9  2
# C1  2  1  0  0  0  2 10
# 
# Overall Statistics
# 
# Accuracy : 0.4884         
# 95% CI : (0.379, 0.5986)
# No Information Rate : 0.1977         
# P-Value [Acc > NIR] : 1.437e-09      
# 
# Kappa : 0.3975         
# Mcnemar's Test P-Value : NA             
# 
# Statistics by Class:
# 
#                      Class: A1 Class: A2 Class: A3 Class: A4 Class: B1 Class: B2 Class: C1
# Sensitivity            0.41667   0.41667   0.57143   0.50000   0.27273    0.5294    0.6667
# Specificity            0.90541   0.90541   0.96203   0.91892   0.89333    0.8841    0.9296
# Pos Pred Value         0.41667   0.41667   0.57143   0.50000   0.27273    0.5294    0.6667
# Neg Pred Value         0.90541   0.90541   0.96203   0.91892   0.89333    0.8841    0.9296
# Prevalence             0.13953   0.13953   0.08140   0.13953   0.12791    0.1977    0.1744
# Detection Rate         0.05814   0.05814   0.04651   0.06977   0.03488    0.1047    0.1163
# Detection Prevalence   0.13953   0.13953   0.08140   0.13953   0.12791    0.1977    0.1744
# Balanced Accuracy      0.66104   0.66104   0.76673   0.70946   0.58303    0.7067    0.7981

我观察到有很多子类属于另一个类。例如,A1A2A3A4属于类A。同样,B1B2属于类B。我想在平等对待一个类中的所有子类后计算统计数据。是否有任何功能可以生成类似的类内和类外错误的统计数据?

**注意:**请不要提出包含去除子类中数字的解决方案,因为真实的应用与此并不相似。为了简单起见,我给出了这个例子。

如果类和子类的关系已经给出,是否可能得到解?

5gfr0r5j

5gfr0r5j1#

通过删除子类后缀来手动定义类如何?

actual <- c(rep('A1',12), rep('A2',12), rep('A3',7), rep('A4',12), rep('B1',11), rep('B2',17), rep('C1',15))
    prediction <- c('A1','A1','A1','A1','A1','A3','A4','A4','B1','B2','C1','C1',
                    'A2','A2','A2','A2','A2','A3','A4','A4','A4','B1','B1','C1',
                    'A1','A2','A3','A3','A3','A3','B2',
                    'A1','A1','A2','A2','A2','A4','A4','A4','A4','A4','A4','B1',
                    'A1','A2','A2','A4','B1','B1','B1','B2','B2','B2','B2',
                    'A1','A3','B1','B1','B1','B1','B2','B2','B2','B2','B2','B2','B2','B2','B2','C1','C1',
                    'A1','A1','A2','B2','B2','C1','C1','C1','C1','C1','C1','C1','C1','C1','C1')
    actual = gsub("\\d", "", actual)
    prediction = gsub("\\d", "", prediction)
    caret::confusionMatrix(prediction, actual)

#output
Confusion Matrix and Statistics

          Reference
Prediction  A  B  C
         A 34  6  3
         B  6 20  2
         C  3  2 10

Overall Statistics

               Accuracy : 0.7442          
                 95% CI : (0.6387, 0.8322)
    No Information Rate : 0.5             
    P-Value [Acc > NIR] : 3.272e-06       

                  Kappa : 0.5831          
 Mcnemar's Test P-Value : 1               

Statistics by Class:

                     Class: A Class: B Class: C
Sensitivity            0.7907   0.7143   0.6667
Specificity            0.7907   0.8621   0.9296
Pos Pred Value         0.7907   0.7143   0.6667
Neg Pred Value         0.7907   0.8621   0.9296
Prevalence             0.5000   0.3256   0.1744
Detection Rate         0.3953   0.2326   0.1163
Detection Prevalence   0.5000   0.3256   0.1744
Balanced Accuracy      0.7907   0.7882   0.7981

相关问题