1.数据
我有调查数据:
dat <- structure(list(ID = c(4, 5), Start_time = structure(c(1676454186,
1676454173), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
End_time = structure(c(1676454352, 1676454642), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), `want_to_change Mult answ` = c("Yes (for the environment), because it provided a starting point to collectively do something about energy consumption.;",
"Yes (because of the gas crisis), because it provided a starting point to collectively do something. ;"
), actually_changed = c("Yes, I tried to use less energy in the office.",
"No, not at all."), `control Mult answ` = c("We / I can control the lights.;Closing/opening doors and windows.;",
"We / I can control the lights.;Closing/opening doors and windows.;"), `measures_taken Mult answ` = c("Yes, I checked for lights that were not turned off.; Yes, went home early",
"Yes, I checked for lights that were not turned off.;")), row.names = c(NA,
-2L), class = c("data.table",
"data.frame"))
如下所示:
2.数据结构
某些列可以有多个答案。这些列的列名中有"Mult answ"
。例如,请参见第1行第6列(dat[1,6]
)。
> dat[1,6]
control Mult answ
1: We / I can control the lights.;Closing/opening doors and windows.;
3.提问
我想写一段代码:
1.将所有只出现一次的答案更改为Other
(这是因为有许多自定义答案)。
1.为每个答案选项创建一个单独的列,并带有通用后缀。
4.我尝试过的
我想我会首先选择有多个答案的列:
# Get columns with more than one answer
temp <- select(dat,contains("Mult answ"))
cols_with_more_answers <- names(temp)
然后我想用分号把列分开(在我计数它们并把唯一的列改为other
之前)。
# Separate columns
tidyr::separate(data.frame(text = dat), text, into = c("A", "B", "C"), sep = ";", fill = "right", extra = "drop")
我该怎么继续?
5.期望输出
dat <- structure(list(ID = c(4, 5),
Start_time = structure(c(1676454186, 1676454173), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
End_time = structure(c(1676454352, 1676454642), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
`want_to_change Mult answ` = c("Other", "Other"),
actually_changed = c("No, not at all.", "Yes, I tried to use less energy in the office."),
`control Mult answ A` = c("We / I can control the lights.", "We / I can control the lights."),
`control Mult answ B` = c("Closing/opening doors and windows", "Closing/opening doors and windows"),
`measures_taken Mult answ A` = c("Yes, I checked for lights that were not turned off.", "Yes, I checked for lights that were not turned off."),
`measures_taken Mult answ B` = c(NA, "Yes, went home early")),
row.names = c(NA, -2L),
class = c("data.table", "data.frame"))
1条答案
按热度按时间83qze16e1#
你可以做这样的事情。(将问题转换为字母,并使其稳定,以防你有超过26个答案是有点棘手,但我找到了一种方法绕过它)
我在代码中留下了一些注解,简而言之:
separate_rows
分隔答案。forcats::fct_lump_min
。values2letters
来调用expand_letters
。第一个函数只是简单地将答案重新编码成字母。第二个函数创建字母。如果你有超过26个答案,字母就不够了,所以这个函数会组合字母)。创建于2023年3月20日,使用reprex v2.0.2