我在R中的过滤选项中挣扎。我有这个数据集:
patient_id period TREAT_CAT Outcome
1 -3228 days pre-treatment Pink
1 -3170 days pre-treatment Pink
1 100 days post-treatment Blue
1 200 days post-treatment Pink
2 -2900 days pre-treatment Blue
2 0 days post-treatment Pink
2 100 days post-treatment Pink
structure(list(patient_id = c(1, 1, 1, 1, 2, 2, 2),
period = structure(c(-3228, -3170, 100, 200, -2900,
0, 100), class = "difftime", units = "days"),
TREAT_CAT = structure(c( 1L, 1L, 2L, 2L, 1L, 2L, 2L), levels = c("pre-treatment", "post-treatment"), class = "factor"),
Outcome = c("Pink", "Pink", "Blue", "Pink",
"Blue", "Pink", "Pink"), row.names = c(19L, 26L, 24L, 3L, 7L, 29L, 20L), class = "data.frame")
我想过滤治疗前组最接近0的“周期”和治疗后组最接近0的“周期”。
我试过这样的方法
df2 <- df %>%
group_by(patient_id) %>%
filter((TREAT_CAT=="pre-treatment" & period == min(period)) | (TREAT_CAT=="post-treatment" & period == max(period))) %>%
filter(n() == 2)
但很明显,它给了我两个时间段从0开始的最大值,我也尝试了两个组的max(时间段),但它不起作用,因为max(时间段)只发生在治疗后组,导致0变量。
我希望是这样的
patient_id period TREAT_CAT Outcome
1 -3170 days pre-treatment Pink
1 200 days post-treatment Pink
2 -2900 days pre-treatment Blue
2 100 days post-treatment Pink
你能帮帮忙吗?
先谢了
1条答案
按热度按时间eit6fx6z1#
我假设
pre-treatment
总是负的,这样,你就可以对每个组