R：压缩行的类别并拆分计数

zvms9eto 于 2023-03-05 发布在其他

关注(0)|答案(3)|浏览(101)

在r中，我有一个 Dataframe

data.frame(value1=c("apple", "orange","banana","apple,orange"), count=c(2,4,6,2))

我希望日期框变成

data.frame(value1=c("apple", "orange","banana"), count=c(3,5,6))

通过消除行"苹果，橙子"，并将计数添加到"苹果"和"橙子"
我试着用

df$value1 <- unlist(strsplit(as.character(df$value1), ","))

，但我认为这是错误的做法...
谢谢大家!

来源：https://stackoverflow.com/questions/75575575/r-condense-rows-category-and-split-the-count

3条答案

按热度按时间

dpiehjr41#

我们可以通过除以单词计数来重新校准count值，即假设每个实体用逗号分隔，计算逗号的数量并加1，仅针对具有逗号字符的实体，然后分隔value1列，按总和分组（reframe）

library(dplyr) # version >= 1.1.0
library(tidyr)
library(stringr)
 df1 %>% 
   mutate(count = case_when(str_detect(value1, ",") ~
      count/(str_count(value1, ",") + 1), TRUE ~ count)) %>% 
   separate_longer_delim(value1, delim = regex(",\\s*")) %>% 
   reframe(count = sum(count), .by = value1)

输出

value1 count
1  apple     3
2 orange     5
3 banana     6

赞(0）回复(0）举报 2023-03-05

qojgxg4l2#

与akrun的想法相似，但使用的函数略有不同：

df %>%
   mutate(count = count / (1+str_count(value1, ',')))%>%
   separate_rows(value1) %>%
   count(value1, wt = count)

# A tibble: 3 × 2
  value1     n
  <chr>  <dbl>
1 apple      3
2 banana     6
3 orange     5

以R为底：

a <- strsplit(df$value1, ",")
 b <- df$count/(nchar(gsub("[^,]", "", df$value1)) + 1)
 stack(tapply(rep(b, lengths(a)), unlist(a), sum))
  values    ind
1      3  apple
2      6 banana
3      5 orange

赞(0）回复(0）举报 2023-03-05

cclgggtu3#

我其实想出了一个愚蠢的方法...

#reframing the problem
df0 <-data.frame(value1=c("apple", "orange","banana","apple,orange", "orange,banana"), count=c(2,4,6,2,4))

#reframing the ideal solution
df0 <-data.frame(value1=c("apple", "orange","banana"), count=c(2,4,6))

### my solution: ###

df1 <-
#select the rows to be mutated
  df0[str_detect(df0$value1,","),]%>%
#separate the values into two columns
  separate(col = value1, into = paste0("fruit", 1:2), sep = ",")%>%
#divide the count by two
  mutate(count = count/2)%>%
#turn columns into rows
  pivot_longer(fruit1:fruit2, names_to = "longername", values_to = "value1") %>%
#remove the unneeded column
  select(-"longername") %>%
#add back the og column that has single value
  rbind(df0[!str_detect(df0$value1,","),])%>%
#recalculate the count
  group_by(value1)%>%
  summarise(newcount = sum(count))

赞(0）回复(0）举报 2023-03-05

我来回答

R：压缩行的类别并拆分计数

3条答案

相关问题

热门标签

最新问答