R:压缩行的类别并拆分计数

zvms9eto  于 2023-03-05  发布在  其他
关注(0)|答案(3)|浏览(101)

在r中,我有一个 Dataframe

data.frame(value1=c("apple", "orange","banana","apple,orange"), count=c(2,4,6,2))

我希望日期框变成

data.frame(value1=c("apple", "orange","banana"), count=c(3,5,6))

通过消除行"苹果,橙子",并将计数添加到"苹果"和"橙子"
我试着用

df$value1 <- unlist(strsplit(as.character(df$value1), ","))

,但我认为这是错误的做法...
谢谢大家!

dpiehjr4

dpiehjr41#

我们可以通过除以单词计数来重新校准count值,即假设每个实体用逗号分隔,计算逗号的数量并加1,仅针对具有逗号字符的实体,然后分隔value1列,按总和分组(reframe

library(dplyr) # version >= 1.1.0
library(tidyr)
library(stringr)
 df1 %>% 
   mutate(count = case_when(str_detect(value1, ",") ~
      count/(str_count(value1, ",") + 1), TRUE ~ count)) %>% 
   separate_longer_delim(value1, delim = regex(",\\s*")) %>% 
   reframe(count = sum(count), .by = value1)
  • 输出
value1 count
1  apple     3
2 orange     5
3 banana     6
qojgxg4l

qojgxg4l2#

与akrun的想法相似,但使用的函数略有不同:

df %>%
   mutate(count = count / (1+str_count(value1, ',')))%>%
   separate_rows(value1) %>%
   count(value1, wt = count)

# A tibble: 3 × 2
  value1     n
  <chr>  <dbl>
1 apple      3
2 banana     6
3 orange     5

以R为底:

a <- strsplit(df$value1, ",")
 b <- df$count/(nchar(gsub("[^,]", "", df$value1)) + 1)
 stack(tapply(rep(b, lengths(a)), unlist(a), sum))
  values    ind
1      3  apple
2      6 banana
3      5 orange
cclgggtu

cclgggtu3#

我其实想出了一个愚蠢的方法...

#reframing the problem
df0 <-data.frame(value1=c("apple", "orange","banana","apple,orange", "orange,banana"), count=c(2,4,6,2,4))
#reframing the ideal solution
df0 <-data.frame(value1=c("apple", "orange","banana"), count=c(2,4,6))
### my solution: ###

df1 <-
#select the rows to be mutated
  df0[str_detect(df0$value1,","),]%>%
#separate the values into two columns
  separate(col = value1, into = paste0("fruit", 1:2), sep = ",")%>%
#divide the count by two
  mutate(count = count/2)%>%
#turn columns into rows
  pivot_longer(fruit1:fruit2, names_to = "longername", values_to = "value1") %>%
#remove the unneeded column
  select(-"longername") %>%
#add back the og column that has single value
  rbind(df0[!str_detect(df0$value1,","),])%>%
#recalculate the count
  group_by(value1)%>%
  summarise(newcount = sum(count))

相关问题