R语言 每当解释性变量发生变化时,就在变量后面附加一个计数器

xdyibdwo  于 2023-05-04  发布在  其他
关注(0)|答案(3)|浏览(175)

我在r中有一个这样的 Dataframe :

df <- data.frame(policy_ID = c("P1","P1","P1","P2","P2","P3","P4","P4","P4","P4"), 
                 City = c("Copenhagen", "Copenhagen", "London", "LA", "LA",
                          "Tokyo", "Madrid", "Madrid", "Rome", "Milan"),
                 Floor = c("1","1","1","3","4","2","1","1","4","4"))

如果City或Floor的值发生更改,我希望policy_ID发生一些更改。期望输出如下:

df <- data.frame(policy_ID = c("P1","P1","P1_2","P2","P2_2","P3","P4","P4","P4_2","P4_3"), 
                 City = c("Copenhagen", "Copenhagen", "London", "LA", "LA",
                          "Tokyo", "Madrid", "Madrid", "Rome", "Milan"),
                 Floor = c("1","1","1","3","4","2","1","1","4","4"))

有没有人知道如何做到这一点?
我已经成功地生成了这段代码,如果进行了更改,它会输出1:

df %>%
  group_by(policy_ID) %>%
  mutate(City_or_Floor_changed = ifelse(City != lag(City, default = first(City)) | 
                                        Floor != lag(Floor, default = first(Floor)), 
                                      1, 0)) %>%
  ungroup()

但是,我正在努力弄清楚如何按照我的意愿更改policy_ID。

yhxst69z

yhxst69z1#

第二种选择是使用dplyr::consecutive_id,它是在dplyr 1.1.0中引入的,并受到data.table::rleid的启发:

library(dplyr, warn = FALSE)

df |>
  mutate(run = consecutive_id(City, Floor), .by = policy_ID) |>
  mutate(
    policy_ID = if_else(run > 1,
      paste(policy_ID, run, sep = "_"),
      policy_ID
    )
  )
#>    policy_ID       City Floor run
#> 1         P1 Copenhagen     1   1
#> 2         P1 Copenhagen     1   1
#> 3       P1_2     London     1   2
#> 4         P2         LA     3   1
#> 5       P2_2         LA     4   2
#> 6         P3      Tokyo     2   1
#> 7         P4     Madrid     1   1
#> 8         P4     Madrid     1   1
#> 9       P4_2       Rome     4   2
#> 10      P4_3      Milan     4   3
nukf8bse

nukf8bse2#

我们可以使用rleid来创建索引

library(dplyr)
library(data.table)
library(stringr)
df %>%
   mutate(policy_ID2 = policy_ID) %>%
   mutate(tmp = rleid(City, Floor), 
   policy_ID = case_when(tmp > 1 ~ str_c(policy_ID, '_', tmp), 
      TRUE ~ policy_ID), .by = policy_ID2) %>% 
   select(-tmp, - policy_ID2)
  • 输出
policy_ID       City Floor
1         P1 Copenhagen     1
2         P1 Copenhagen     1
3       P1_2     London     1
4         P2         LA     3
5       P2_2         LA     4
6         P3      Tokyo     2
7         P4     Madrid     1
8         P4     Madrid     1
9       P4_2       Rome     4
10      P4_3      Milan     4
z31licg0

z31licg03#

只是为了 * lafts *,这里有一个使用lagcumsum的答案:

library(dplyr)

df %>% 
  mutate(tmp = cumsum(City != lag(City, default = "null") | 
                      Floor != lag(Floor, default = "null")), 
         policy_ID = case_when(tmp == 1 ~ policy_ID,
                               TRUE ~paste(policy_ID, tmp, sep = "_")),
         .by = policy_ID) %>% 
  select(-tmp)
#>    policy_ID       City Floor
#> 1         P1 Copenhagen     1
#> 2         P1 Copenhagen     1
#> 3       P1_2     London     1
#> 4         P2         LA     3
#> 5       P2_2         LA     4
#> 6         P3      Tokyo     2
#> 7         P4     Madrid     1
#> 8         P4     Madrid     1
#> 9       P4_2       Rome     4
#> 10      P4_3      Milan     4

相关问题