基于R中的字符值创建新列

qc6wkl3g  于 2022-12-06  发布在  其他
关注(0)|答案(4)|浏览(109)

我有一个数据框,其中有一个名为“full_name”的列,表示两个团队,例如:"曼联赢利物浦赢利物浦赢曼联赢切尔西赢阿森纳赢等等......“
我希望能够将球队分为北队和南队,这样,如果出现“曼联赢利物浦赢”或“利物浦赢曼联赢”,那么这就是“北队”,而如果出现“切尔西赢阿森纳赢”,这就是“南队”,依此类推。

levels(raw_data$full_name)[levels(raw_data$full_name)== "Man U to win Liverpool to win"] <- 'North'
levels(raw_data$full_name)[levels(raw_data$full_name)== "Liverpool to win Man U to win"] <- 'North'
levels(raw_data$full_name)[levels(raw_data$full_name)== "Chelsea to win Arsenal to win"] <- 'South'

上面的代码没有产生任何错误,但是 Dataframe 保持不变,并且没有产生所需的输出。有办法做到这一点吗?

rjzwgtxy

rjzwgtxy1#

下面是一个 tidyverse 方法的示例,它可能会对您有所帮助

library(dplyr)

north <- c("Man U to win Liverpool to win","Liverpool to win Man U to win")
south <- c("Chelsea to win Arsenal to win")

df <- 
  data.frame(full_name = sample(c(north,south),size = 5,replace = TRUE))
             
df %>% 
  mutate(region = case_when(
    full_name %in% north ~ "North",
    full_name %in% south ~ "South"
  ))

                      full_name region
1 Chelsea to win Arsenal to win  South
2 Man U to win Liverpool to win  North
3 Chelsea to win Arsenal to win  South
4 Man U to win Liverpool to win  North
5 Man U to win Liverpool to win  North
vuv7lop3

vuv7lop32#

下面是一种替代方法:
第一个

ulydmbyx

ulydmbyx3#

在R进制中,如果删除levels()调用,代码将按预期工作。如果希望列成为因子,可以在替换值后调用factor()

# example data
raw_data <- data.frame(full_name = c(
  "Man U to win Liverpool to win", 
  "Liverpool to win Man U to win",
  "Chelsea to win Arsenal to win"
))

raw_data$full_name[raw_data$full_name == "Man U to win Liverpool to win"] <- "North"
raw_data$full_name[raw_data$full_name == "Liverpool to win Man U to win"] <- "North"
raw_data$full_name[raw_data$full_name == "Chelsea to win Arsenal to win"] <- "South"

raw_data$full_name <- factor(raw_data$full_name)

或者,您可以使用具名向量做为参数表:

lookup <- c(
  "Man U to win Liverpool to win" = "North",
  "Liverpool to win Man U to win" = "North",
  "Chelsea to win Arsenal to win" = "South"
)

raw_data$full_name <- factor(lookup[raw_data$full_name])

两种方法的结果:

#> raw_data
  full_name
1     North
2     North
3     South

#> levels(raw_data$full_name)
[1] "North" "South"
avwztpqn

avwztpqn4#

下面是fct_recode的一个选项

library(forcats)
raw_data$full_name <- with(raw_data, fct_recode(full_name, 
   North =  "Man U to win Liverpool to win",
   North = "Liverpool to win Man U to win",
   South  =  "Chelsea to win Arsenal to win"))

或使用base R

factor(raw_data$full_name, levels = c("Chelsea to win Arsenal to win", 
"Liverpool to win Man U to win", "Man U to win Liverpool to win"
), labels = c("South", "North", "North"))

或者如果我们想使用levels

lvls_to_change <-  c("Man U to win Liverpool to win",
   "Liverpool to win Man U to win", "Chelsea to win Arsenal to win")
lvsl_new <- c("North", "North", "South")
i1 <- levels(raw_data$full_name) %in% lvls_to_change
levels(raw_data$full_name)[i1] <- lvsl_new[match(levels(raw_data$full_name)[i1], lvls_to_change)]

数据

raw_data <- structure(list(full_name = structure(c(2L, 2L, 3L, 2L,
 1L), levels = c("Chelsea to win Arsenal to win", 
"Liverpool to win Man U to win", "Man U to win Liverpool to win"
), class = "factor")), row.names = c(NA, -5L), class = "data.frame")

相关问题