以group_by方式计数事件(R)

axkjgtzd  于 2023-05-04  发布在  其他
关注(0)|答案(2)|浏览(116)

下面是我的代码:

set.seed(23)
data_toy <- tibble(
  family_code = sample(factor(400:410),1000,T),
  event_type = factor(sample(c("sad","happy"),1000,
                replace = TRUE,prob = c(.2,.8))),
  score = sample(1:100,1000,TRUE)
) %>% mutate(score = if_else(event_type =="happy",NA,score)) %>% 
  arrange(family_code)

输出:

family_code event_type score
   <fct>       <fct>      <int>
 1 400         happy         NA
 2 400         happy         NA
 3 400         happy         NA
 4 400         happy         NA
 5 400         sad           57
 6 400         happy         NA
 7 400         happy         NA
 8 400         happy         NA
 9 400         happy         NA
10 400         sad           65

我想创建一个功能,计算每个家庭的快乐事件的数量,直到悲伤事件。
在我分享的例子中,我想要的输出是:

family_code event_type score happy_counter
   <fct>       <fct>      <int>         <dbl>
 1 400         happy         NA            NA
 2 400         happy         NA            NA
 3 400         happy         NA            NA
 4 400         happy         NA            NA
 5 400         sad           57             4
 6 400         happy         NA            NA
 7 400         happy         NA            NA
 8 400         happy         NA            NA
 9 400         happy         NA            NA
10 400         sad           65             4
11 400         happy         NA            NA
12 400         happy         NA            NA
13 400         happy         NA            NA
14 400         happy         NA            NA
15 400         happy         NA            NA
16 400         happy         NA            NA
17 400         happy         NA            NA
18 400         happy         NA            NA
19 400         sad           79             8
20 400         sad           78             0

我的数据接近了。10k观察我尝试了group_bynest_by,但在每次悲伤事件后都难以将计数归零。

wkyowqbh

wkyowqbh1#

类似于下面的代码,使用lag访问计数值

library(dplyr)

data_toy %>% 
  group_by(grp = consecutive_id(event_type), family_code) %>% 
  mutate(is = sum(event_type == "happy")) %>% 
  ungroup() %>% 
  mutate(happy_counter = if_else(event_type == "sad", lag(is), NA)) %>% 
  select(-c(grp, is)) %>% 
  print(n = 21)
# A tibble: 1,000 × 4
   family_code event_type score happy_counter
   <fct>       <fct>      <int>         <int>
 1 400         happy         NA            NA
 2 400         happy         NA            NA
 3 400         happy         NA            NA
 4 400         happy         NA            NA
 5 400         sad           57             4
 6 400         happy         NA            NA
 7 400         happy         NA            NA
 8 400         happy         NA            NA
 9 400         happy         NA            NA
10 400         sad           65             4
11 400         happy         NA            NA
12 400         happy         NA            NA
13 400         happy         NA            NA
14 400         happy         NA            NA
15 400         happy         NA            NA
16 400         happy         NA            NA
17 400         happy         NA            NA
18 400         happy         NA            NA
19 400         sad           79             8
20 400         sad           78             0
21 400         happy         NA            NA
# … with 979 more rows
# ℹ Use `print(n = ...)` to see more rows
8ftvxx2r

8ftvxx2r2#

试试看

library(dplyr)
out <- data_toy %>%
   group_by(family_code, ind = consecutive_id(event_type)) %>% 
   mutate(n = n()) %>% 
   slice_head(n = 1) %>%
   group_by(family_code) %>%
   mutate(n = lag(n) * NA^(event_type == "happy")) %>%
   ungroup %>%
   select(ind, family_code, event_type, happy_counter = n) %>%
   left_join(data_toy %>% 
   mutate(ind = consecutive_id(event_type)), .) %>% 
   group_by(family_code, ind) %>% 
   mutate(happy_counter = happy_counter * (all(event_type == "sad") & 
     !duplicated(happy_counter))) %>%
   ungroup
  • 输出
head(out, 20)
# A tibble: 20 × 5
   family_code event_type score   ind happy_counter
   <fct>       <fct>      <int> <int>         <dbl>
 1 400         happy         NA     1            NA
 2 400         happy         NA     1            NA
 3 400         happy         NA     1            NA
 4 400         happy         NA     1            NA
 5 400         sad           57     2             4
 6 400         happy         NA     3            NA
 7 400         happy         NA     3            NA
 8 400         happy         NA     3            NA
 9 400         happy         NA     3            NA
10 400         sad           65     4             4
11 400         happy         NA     5            NA
12 400         happy         NA     5            NA
13 400         happy         NA     5            NA
14 400         happy         NA     5            NA
15 400         happy         NA     5            NA
16 400         happy         NA     5            NA
17 400         happy         NA     5            NA
18 400         happy         NA     5            NA
19 400         sad           79     6             8
20 400         sad           78     6             0

相关问题