我在r中有一个这样的 Dataframe :
df <- data.frame(policy_ID = c("P1","P1","P1","P2","P2","P3","P4","P4","P4","P4"),
City = c("Copenhagen", "Copenhagen", "London", "LA", "LA",
"Tokyo", "Madrid", "Madrid", "Rome", "Milan"),
Floor = c("1","1","1","3","4","2","1","1","4","4"))
如果City或Floor的值发生更改,我希望policy_ID发生一些更改。期望输出如下:
df <- data.frame(policy_ID = c("P1","P1","P1_2","P2","P2_2","P3","P4","P4","P4_2","P4_3"),
City = c("Copenhagen", "Copenhagen", "London", "LA", "LA",
"Tokyo", "Madrid", "Madrid", "Rome", "Milan"),
Floor = c("1","1","1","3","4","2","1","1","4","4"))
有没有人知道如何做到这一点?
我已经成功地生成了这段代码,如果进行了更改,它会输出1:
df %>%
group_by(policy_ID) %>%
mutate(City_or_Floor_changed = ifelse(City != lag(City, default = first(City)) |
Floor != lag(Floor, default = first(Floor)),
1, 0)) %>%
ungroup()
但是,我正在努力弄清楚如何按照我的意愿更改policy_ID。
3条答案
按热度按时间yhxst69z1#
第二种选择是使用
dplyr::consecutive_id
,它是在dplyr 1.1.0
中引入的,并受到data.table::rleid
的启发:nukf8bse2#
我们可以使用
rleid
来创建索引z31licg03#
只是为了 * lafts *,这里有一个使用
lag
和cumsum
的答案: