R语言 当我应用smote函数来平衡类时得到错误和警告

lokaqttq  于 2023-04-03  发布在  其他
关注(0)|答案(2)|浏览(206)

我试图应用一个smote函数来平衡我的类。
这是我的代码:

smote_train <- SMOTE(tested_covid ~., data = dataTrain, k = 5, perc.over = 100, perc.under = 200)

这是我的错误警告:

Error in T[, col] <- data[, col] : 
  incorrect number of subscripts on matrix
In addition: Warning messages:
1: In if (class(data[, col]) %in% c("factor", "character")) { :
  the condition has length > 1 and only the first element will be used
2: In if (class(data[, col]) %in% c("factor", "character")) { :
  the condition has length > 1 and only the first element will be used

这是我的数据结构和类型:

structure(list(id = c("ff0113a9-79d4-4042-992f-c5092e30b6af", 
"7b104740-c0c2-44bb-82d8-442ea06a3a96", "8533b6e2-bffe-46da-8056-8b77b89a5819", 
"21d33ae7-8ad8-4744-8370-d376a7e5d251", "c9225467-8ff1-4305-85ad-6c9386e38347", 
"e2e445c4-dffd-4543-b311-efdf2af23744"), age = c(63, 19, 23, 
28, 40, 31), gender = c("Male", "Female", "Male", "Female", "Female", 
"Male"), country = c("India", "Phillipines", "India", "Phillipines", 
"South Africa", "Pakistan"), chills = c("No", "Mild", "No", "Mild", 
"No", "No"), Cough = c("No", "Severe", "No", "Mild", "Mild", 
"No"), diarrhoea = c("No", "Mild", "No", "No", "No", "No"), fatigue = c("No", 
"Moderate", "Mild", "Mild", "Mild", "Mild"), healthcare_worker = c("No", 
"No", "No", "No", "No", "Yes"), how_unwell = c(1, 7, 1, 6, 4, 
2), comorbidity_one = c("Asthma (managed with an inhaler)", "None", 
"Obesity", "High Blood Pressure (hypertension)", "None", "None"
), loss_smell_taste = c("No", "No", "No", "No", "No", "No"), 
    muscle_ache = c("No", "Moderate", "No", "Moderate", "Mild", 
    "Mild"), nasal_congestion = c("No", "No", "No", "No", "Mild", 
    "No"), nausea_vomiting = c("No", "No", "No", "No", "No", 
    "No"), no_days_symptoms_show = c("None", "4", "None", "More than 21", 
    "None", "2"), self_diagnosis = c("None", "Mild", "None", 
    "Mild", "None", "Mild"), shortness_breath = c("No", "Mild", 
    "No", "No", "No", "Mild"), sore_throat = c("No", "No", "No", 
    "No", "Mild", "No"), sputum = c("No", "Mild", "No", "Mild", 
    "Mild", "No"), temperature = c("No", "No", "No", "No", "No", 
    "37.5-38"), tested_covid = structure(c(1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("Negative", "Positive"), class = "factor")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
xcitsw88

xcitsw881#

我已经用read.csv而不是read_csv读取了数据。我还将变量从字符改为因子,从int改为数值,这样就解决了问题。

jogvjijk

jogvjijk2#

我收到了同样的错误-我有一个因子和数值变量的混合,但这似乎不是问题。我的问题是通过使用as.data.frame将我的tibble转换为dataframe来解决的

相关问题