在R中计算结束时间

rpppsulh  于 2023-09-27  发布在  其他
关注(0)|答案(1)|浏览(105)

我有一个数据框,其中有相同患者的多个患者记录,前两列是他们的就诊时间,接下来的两列是他们就诊时所处的疾病阶段,接下来的两列对应于他们正在接受的治疗。我想根据访视1时给予的首次治疗计算每例患者的开始和结束时间。我能够根据另一篇文章中的解决方案计算出开始时间,但正在努力寻找结束时间。
Modifying function
我在想我会尝试使用“ifelse”函数找到开始时间的方法,但是我需要考虑很多条件。如果患者记录了开始时间,则记录患者的结束治疗,然后进入他们的第二患者记录并查看R2是“响应”还是“死亡”,如果是,则检查T1和T2是否彼此相等,如果满足所有这些要求,则记录结束时间,其可以是V2,并且如果满足条件,则可能保持重复。
下面是一个可重复的例子

df <- data.frame(
  Patient = c('Dave', 'Dave', 'Dave', "Angel", "Angel", "Angel", "Joe", "Joe", "Joe", "Cara", "Cara"),
  V1 = c(1, 150, 375, 1, 150, 375, 1, 150, 375, 1, 150),
  V2 = c(150, 375,568,150, 375, 568, 150, 375, 568, 150,375),
  R1 = c("Disease","Response","Response", "Disease","Disease", "Response","Disease", "Response", "Response", "Disease", "Response"),
  R2 = c("Response", "Response", "Response", "Disease", "Response", "Death", "Response", "Disease", "Response", "Response", "Death"),
  T1 = c("A","A", "A", "A","B","B", "A","A","C", "A", "A"),
  T2 = c("A", "A","B",  "B","B","B", "A","C","C" , "A", "A"))

df$start <- NULL
df$start <- ifelse(df$V1 == 1 & df$T1 == df$T2 & df$R2 == "Response", df$V2, NA)

Dave的结束时间将是568,因为从技术上讲,直到访视568,他的治疗是A,然后改变了。Angel将没有开始时间,因为他们在第一次治疗时从未看到React。Joe的结束时间将是150,因为他在访问375时停止看到响应,因此结束时间将与开始时间相同。最后,Cara的结束时间是375,因为我们假设她在死亡前一直有React。
我觉得这很难理解,所以我可以在评论中回答问题。提前感谢!
编辑:

df <- data.frame(
  Patient = c('Dave', 'Dave', 'Dave', "Angel", "Angel", "Angel", "Joe", "Joe", "Joe", "Cara", "Cara", "Tanya", "Tanya", "Tanya", "Tanya"),
  V1 = c(1, 150, 375, 1, 150, 375, 1, 150, 375, 1, 150, 1,150, 375,568),
  V2 = c(150, 375,568,150, 375, 568, 150, 375, 568, 150,375, 150, 375, 568, 600),
  R1 = c("Disease","Response","Response", "Disease","Disease", "Response","Disease", "Response", "Response", "Disease", "Response", "Disease", "Response", "Response", "Disease"),
  R2 = c("Response", "Response", "Response", "Disease", "Response", "Death", "Response", "Disease", "Response", "Response", "Death", "Response", "Response", "Disease", "Response"),
  T1 = c("A","A", "A", "A","B","B", "A","A","C", "A", "A", "B", "B", "A","B" ),
  T2 = c("A", "A","B",  "B","B","B", "A","C","C" , "A", "A", "B", "A", "B", "B"))

用你的代码tanya得到了600而它应该是375

n6lpvg4x

n6lpvg4x1#

下面是一个使用dplyr的方法:

library(dplyr)
df |>
  filter(T1 == first(T1), .by = Patient) |>
  summarize(end = max(case_when(R2 %in% c("Response", "Death") ~ V2,
                            R1 == "Response" ~ V1,
                            TRUE ~ NA)), .by = Patient)

  Patient end
1    Dave 568
2   Angel  NA
3     Joe 150
4    Cara 375

编辑--基于OP增加的让某人从原来的治疗中来回切换的场景,我们可能会修改它,如下所示:

df |>
  mutate(era = cumsum(T1 != lag(T1, default = "")), .by = Patient) |>
  filter(T1 == first(T1), .by = c(Patient)) |>
  summarize(end = max(case_when(R2 %in% c("Response", "Death") ~ V2,
                                R1 == "Response" ~ V1,
                                TRUE ~ NA)), .by = c(Patient, era))

这输出了每个患者在其接受首次治疗的每个“时期”内的总结。因此,在本例中,基于处理B中的两次单独运行,我有Tanya的两个输出。如果你只想要第一个纪元的结果,我们可以添加|> filter(era == 1),或者甚至将其添加到第一个过滤器中,以从分析中丢弃后续的结果。

Patient era end
1    Dave   1 568
2   Angel   1  NA
3     Joe   1 150
4    Cara   1 375
5   Tanya   1 375
6   Tanya   3 600

相关问题