我已经回顾了Error: Problem with mutate()
column (...) must be size 15 or 1, not 17192、How to drop columns with column names that contain specific string?、Remove columns that contain a specific word以及相关的错误诊断。
我有一个大型数据集,包含不同地区不同物种的病毒数据-样本数据如下
Country ..2 Area Site ID Species Sample Original Sample/Specimen #
<chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr>
Tanzania NA UMNP UMNPhq AATPH PG Feces AATPHF2
Tanzania NA UMNP UMNPhq AATPI PG Feces AATPIF2
Tanzania NA UMNP UMNPhq AATPJ PG Feces AATPJF2
Tanzania NA UMNP UMNPhq ATTPK PG Feces ATTPKF2
Tanzania NA UMNP UMNPhq AATPL PG Feces AATPLF2
Filovirus (MOD) PCR Date (Filo MOD)
<chr> <date>
Indeterminant 2015-03-16
Indeterminant 2015-03-16
Indeterminant 2015-03-16
Indeterminant 2015-03-16
Negative 2015-03-16
我正在尝试重新编码每个样本ID的病毒状态,阳性或阴性(此处仅为丝状病毒,但有很多丝状病毒,因此请帮助更一般地编码)
代码我已经尝试-第一子集数据只包括一个特定的领域
viral <- subset(data, Area %in% "UMNP")
在这里,我删除了不需要的列,然后能够获得感染状态,但它将样本上的所有其他信息转换为“NA”,导致在我尝试维护这些值时出现额外的错误代码。
viralres <- viral %>%
dplyr::select(-matches(c('Performed by ()', 'performed by', 'Date of', '1Performed by', 'Performed by', "Date ()", "...2"),)) %>%
mutate_if(is.character, ~case_when(. == "Indeterminant" ~ "0",
. == "Negative" ~ "0",
. == "Positive" ~ "1"))
数据输出
structure(list(Country = c("Tanzania", "Tanzania", "Tanzania",
"Tanzania", "Tanzania"), ...2 = c(NA, NA, NA, NA, NA), Area = c("UMNP",
"UMNP", "UMNP", "UMNP", "UMNP"), Site = c("UMNPhq", "UMNPhq",
"UMNPhq", "UMNPhq", "UMNPhq"), `Animal ID` = c("AATPH", "AATPI",
"AATPJ", "ATTPK", "AATPL"), Species = c("Procolobus gordonorum",
"Procolobus gordonorum", "Procolobus gordonorum", "Procolobus gordonorum",
"Procolobus gordonorum"), `Sample Type` = c("Feces", "Feces",
"Feces", "Feces", "Feces"), `Original Sample/Specimen #` = c("AATPHF2",
"AATPIF2", "AATPJF2", "ATTPKF2", "AATPLF2"), `Filovirus (MOD) PCR` = c("Indeterminant",
"Indeterminant", "Indeterminant", "Indeterminant", "Negative"
), `Date (Filo MOD)` = structure(c(16510, 16510, 16510, 16510,
16510), class = "Date")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
2条答案
按热度按时间piah890a1#
使用
mutate_if(is.character, ...)
将更改所有字符列。看起来您尝试更改的唯一列是“Filovirus(MOD)PCR”。因此您可以将命令更改为以获得最小的更改量。这样,您只更改了该列。或者,您可以使用
case_match
更直接地更改该单列请注意,
case_match
是在dplyr 1.1.0
中引入的t1rydlwq2#
使用
mutate_at
代替mutate_if
。在
mutate_at
的第一个参数中,将所有样本ID(丝状病毒等)添加到一个向量中。