如何折叠R中分类变量的水平

aelbi1ox  于 2022-12-20  发布在  其他
关注(0)|答案(4)|浏览(212)

我有多个分类变量,每个变量都有5个以上的水平,我需要一个函数,可以将它们折叠成两个水平

column1<- c("bad","good","nice","fair","great","bad","bad","good","nice",
            "fair","great","bad")
column2<- c("john","ben","cook","seth","brian","deph","omar","mary",
            "frank","boss","kate","sall")

df<- data.frame(column1,column2)

所以对于上面的数据框,在column1中,我想用一个函数把所有的“bad”转换成“bad”,把其他级别转换成“others”。我不知道该怎么做。谢谢

csga3l58

csga3l581#

使用ifelsecase_when

library(dplyr)
df <- df %>% 
   mutate(column1 = case_when(column1 != "bad" ~ "others", TRUE ~ column1))

另外,由于只有一个变化,我们可以只做

df$column1[df$column1 != "bad"] <- "others"
oxosxuxt

oxosxuxt2#

在以R为基数的情况下,一个简单的方法是使用索引:

c('others', 'bad')[(df$column1 == 'bad') + 1]
#> [1] "bad"    "others" "others" "others" "others" "bad"    "bad"   
#> [8] "others" "others" "others" "others" "bad"
hs1ihplo

hs1ihplo3#

df<- data.frame(factor=as.factor(column1),column2)
levels(df$factor)<-c("bad",rep("other",4))
62o28rlo

62o28rlo4#

下面是带分组的dplyr解决方案:

library(dplyr)
df %>% 
  group_by(group = cumsum(column1=="bad")) %>% 
  mutate(column1 = ifelse(row_number()==1, "bad", "others")) %>% 
  ungroup() %>% 
  select(-group)
column1 column2
   <chr>   <chr>  
 1 bad     john   
 2 others  ben    
 3 others  cook   
 4 others  seth   
 5 others  brian  
 6 bad     deph   
 7 bad     omar   
 8 others  mary   
 9 others  frank  
10 others  boss   
11 others  kate   
12 bad     sall

相关问题