R中的累积发生率图和表

dgiusagp  于 2023-02-14  发布在  其他
关注(0)|答案(1)|浏览(183)
    • 背景**:我有一个肝移植患者的数据集(其中一些人在治疗组A,另一些人在治疗组B,这取决于他们使用的免疫抑制药物)。由于移植,这些患者发生供体源性HBV感染的风险很高。
    • 需要什么**:研究者关注感染开始前的时间(首次HBV感染)和随时间推移发生感染的比例。他们还希望获得基线和每个移植后随访时间点HBV感染的累积发生率(6个月、12个月、18个月和24个月)。例如,6个月的数据是随访6个月的患者中曾患HBV的比例,12个月的数据是随访12个月的患者中曾患HBV的比例,以此类推。

在这个特定的病例中,累积发生率仅仅是1减去生存函数,没有考虑任何竞争风险,分析人群没有死亡或失访。

    • 我的问题**是:

1.有没有办法将事件数量添加到图中风险数量的正下方?
1.是否有任何方法也可以获得每个时间点每组的累积发生率以及标准误差和95%置信区间,类似于我们使用以下总结(km)时获得的生命表?这些生命表为我提供了生存概率,所以我想如果我想要累积发生率,我可以仅手动计算1-生存概率,但不确定如何获得标准误差和置信区间?
下面是一个与实际数据集类似的测试数据集,以及我目前所做的工作:

time<-c(1.5989,6.9433, 0.8890, 3.2691, 1.0514, 2.7625, 1.4319, 0.9681, 7.4416, 0.0268, 1.5168, 1.9647, 0.0657, 4.3571, 6.4490, 0.2198, 1.2028, 0.9555, 0.2601, 2.0096, 7.5156, 0.4463, 0.2355, 0.9391, 2.6996)

censor<-c(1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)

group<-c(1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1)

df<-data.frame(ID, time, censor, group)
View(df)

km<-survfit(formula = Surv(time, censor) ~ group, data = df)
summary(km)

#cumulative incidence plot
plot(km, fun = function(x) 1-x)

#log rank test;
survdiff(Surv(time, censor) ~ group, data=df)
#plot survival curves for each treatment group
#cumulative incidence plot 
ggsurvplot(km,
           data = df,
           censor = T,
           risk.table = TRUE,
           legend.labs = c("group 1", "group 2"),
           xlim = c(0,10),
           ylim = c(0,1),
           pval = T,
           pval.method = T,
           pval.method.coord = c(2.5,0.5),
           pval.coord = c(4.2,0.5),
           xlab = "Months",
           ylab = "SURVIVAL PROBABILITY",
           linetype = c(1,2),
           legend.title = "",
           palette = c('red', 'blue'),
           fun="event"
)
fhity93d

fhity93d1#

您可以创建自定义表格,然后删除ggsurvplot默认表格并将其替换为您的自定义表格。
英国癌症杂志A note on competing risks in survival data analysis文章中关于累积死亡率的描述:
这与生存率正好相反,换句话说,某一事件在某一给定时间的累积发生率等于1减去该时间的总生存概率。
对于您所说的事件累积发生率,我们可以使用1-survival_probalility,因此其标准误差与置信区间相同(albiet lower ci作为上限ci,上限ci作为下限ci)。
在这里的自定义图中,我添加了风险数(第一行)、累积事件数(第二行)、生存概率(第三行)及其置信区间(第四行)

libs <- c("survminer", "survival", "tidyverse")
invisible(suppressMessages(sapply(libs,library,character.only=T)))

time <- c(1.5989,6.9433, 0.8890, 3.2691, 1.0514, 2.7625, 1.4319, 0.9681, 7.4416, 
          0.0268, 1.5168, 1.9647, 0.0657, 4.3571, 6.4490, 0.2198, 1.2028, 0.9555, 
          0.2601, 2.0096, 7.5156, 0.4463, 0.2355, 0.9391, 2.6996)

censor <- c(1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)

group <- c(1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1)

df <- data.frame(time, censor, group)

km <- survfit(formula = Surv(time, censor) ~ group, data = df)

#Draw cumulative incidence plot as usual
p1 <- ggsurvplot(km,
           data = df,
           censor = T,
           risk.table = T,
           legend.labs = c("group 1", "group 2"),
           xlim = c(0,7.5),
           ylim = c(0,1),
           pval = T,
           pval.method = T,
           pval.method.coord = c(2.5,0.5),
           pval.coord = c(4.2,0.5),
           xlab = "Months",
           ylab = "SURVIVAL PROBABILITY",
           linetype = c(1,2),
           legend.title = "",
           palette = c('red', 'blue'),
           fun="event",
           cumevents = F, 
           tables.height = 0.4
) 

# extract data table of survival plot

p1_df <- p1$table$data
p1_df
#>     strata time n.risk pct.risk n.event cum.n.event n.censor cum.n.censor
#> 1  group 1    0     14      100       0           0        0            0
#> 2  group 1    2      4       29      10          10        0            0
#> 3  group 1    4      2       14       0          10        2            2
#> 4  group 1    6      1        7       0          10        1            3
#> 5  group 1    8      0        0       0          10        1            4
#> 6  group 2    0     11      100       0           0        0            0
#> 7  group 2    2      5       45       6           6        0            0
#> 8  group 2    4      3       27       0           6        2            2
#> 9  group 2    6      3       27       0           6        0            2
#> 10 group 2    8      0        0       0           6        3            5
#>    strata_size group llabels
#> 1           14     1      14
#> 2           14     1       4
#> 3           14     1       2
#> 4           14     1       1
#> 5           14     1       0
#> 6           11     2      11
#> 7           11     2       5
#> 8           11     2       3
#> 9           11     2       3
#> 10          11     2       0

# use summary of km survfit in time points of survival plot

sum_df <-summary(km, times = p1_df$time) 
sum_df
#> Call: survfit(formula = Surv(time, censor) ~ group, data = df)
#> 
#>                 group=1 
#>  time n.risk n.event survival std.err lower 95% CI upper 95% CI
#>     0     14       0    1.000   0.000        1.000        1.000
#>     0     14       0    1.000   0.000        1.000        1.000
#>     2      4      10    0.286   0.121        0.125        0.654
#>     2      4       0    0.286   0.121        0.125        0.654
#>     4      2       0    0.286   0.121        0.125        0.654
#>     4      2       0    0.286   0.121        0.125        0.654
#>     6      1       0    0.286   0.121        0.125        0.654
#>     6      1       0    0.286   0.121        0.125        0.654
#> 
#>                 group=2 
#>  time n.risk n.event survival std.err lower 95% CI upper 95% CI
#>     0     11       0    1.000    0.00        1.000        1.000
#>     0     11       0    1.000    0.00        1.000        1.000
#>     2      5       6    0.455    0.15        0.238        0.868
#>     2      5       0    0.455    0.15        0.238        0.868
#>     4      3       0    0.455    0.15        0.238        0.868
#>     4      3       0    0.455    0.15        0.238        0.868
#>     6      3       0    0.455    0.15        0.238        0.868
#>     6      3       0    0.455    0.15        0.238        0.868
str(sum_df)
#> List of 19
#>  $ n            : int [1:2] 14 11
#>  $ time         : num [1:16] 0 0 2 2 4 4 6 6 0 0 ...
#>  $ n.risk       : num [1:16] 14 14 4 4 2 2 1 1 11 11 ...
#>  $ n.event      : num [1:16] 0 0 10 0 0 0 0 0 0 0 ...
#>  $ n.censor     : num [1:16] 0 0 0 0 2 0 1 0 0 0 ...
#>  $ surv         : num [1:16] 1 1 0.286 0.286 0.286 ...
#>  $ std.err      : num [1:16] 0 0 0.121 0.121 0.121 ...
#>  $ cumhaz       : num [1:16] 0 0 1.17 1.17 1.17 ...
#>  $ std.chaz     : num [1:16] 0 0 0.39 0.39 0.39 ...
#>  $ strata       : Factor w/ 2 levels "group=1","group=2": 1 1 1 1 1 1 1 1 2 2 ...
#>  $ type         : chr "right"
#>  $ logse        : logi TRUE
#>  $ conf.int     : num 0.95
#>  $ conf.type    : chr "log"
#>  $ lower        : num [1:16] 1 1 0.125 0.125 0.125 ...
#>  $ upper        : num [1:16] 1 1 0.654 0.654 0.654 ...
#>  $ call         : language survfit(formula = Surv(time, censor) ~ group, data = df)
#>  $ table        : num [1:2, 1:9] 14 11 14 11 14 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:2] "group=1" "group=2"
#>   .. ..$ : chr [1:9] "records" "n.max" "n.start" "events" ...
#>  $ rmean.endtime: num [1:2] 7.52 7.52
#>  - attr(*, "class")= chr "summary.survfit"

# extract our columns of interest from `sum_df` dataframe

columns <- lapply(c(2:10,15:16), function(x) sum_df[x])
tbl <- do.call(data.frame, columns)
tbl
#>    time n.risk n.event n.censor      surv   std.err   cumhaz  std.chaz  strata
#> 1     0     14       0        0 1.0000000 0.0000000 0.000000 0.0000000 group=1
#> 2     0     14       0        0 1.0000000 0.0000000 0.000000 0.0000000 group=1
#> 3     2      4      10        0 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 4     2      4       0        0 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 5     4      2       0        2 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 6     4      2       0        0 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 7     6      1       0        1 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 8     6      1       0        0 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 9     0     11       0        0 1.0000000 0.0000000 0.000000 0.0000000 group=2
#> 10    0     11       0        0 1.0000000 0.0000000 0.000000 0.0000000 group=2
#> 11    2      5       6        0 0.4545455 0.1501314 0.736544 0.3072801 group=2
#> 12    2      5       0        0 0.4545455 0.1501314 0.736544 0.3072801 group=2
#> 13    4      3       0        2 0.4545455 0.1501314 0.736544 0.3072801 group=2
#> 14    4      3       0        0 0.4545455 0.1501314 0.736544 0.3072801 group=2
#> 15    6      3       0        0 0.4545455 0.1501314 0.736544 0.3072801 group=2
#> 16    6      3       0        0 0.4545455 0.1501314 0.736544 0.3072801 group=2
#>        lower     upper
#> 1  1.0000000 1.0000000
#> 2  1.0000000 1.0000000
#> 3  0.1248055 0.6540791
#> 4  0.1248055 0.6540791
#> 5  0.1248055 0.6540791
#> 6  0.1248055 0.6540791
#> 7  0.1248055 0.6540791
#> 8  0.1248055 0.6540791
#> 9  1.0000000 1.0000000
#> 10 1.0000000 1.0000000
#> 11 0.2379221 0.8684002
#> 12 0.2379221 0.8684002
#> 13 0.2379221 0.8684002
#> 14 0.2379221 0.8684002
#> 15 0.2379221 0.8684002
#> 16 0.2379221 0.8684002

# create custom table of our metrics and parameters of interests
custom_tbl <- data.frame(time=tbl$time, 
                         n.risk=tbl$n.risk, 
                         n.event=tbl$n.event,
                         cum.inc=1-tbl$surv, 
                         std.err=1-tbl$std.err,
                         ci=paste0(round((1-tbl$upper),2),"-",round((1-tbl$lower),2)),
                         strata=tbl$strata) %>% 
  mutate_if(is.numeric, round, digits=2)

# calculating cumulative incidence from n.event column, then remove duplicated rows and 
# replace them by last similar column

custom_tbl %<>% group_by(strata) %>% mutate(cum.n.event = cumsum(n.event)) %>% 
  group_by(time, strata) %>% 
  slice_tail(n=1)

custom_tbl
#> # A tibble: 8 × 8
#> # Groups:   time, strata [8]
#>    time n.risk n.event cum.inc std.err ci        strata  cum.n.event
#>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl> <chr>     <fct>         <dbl>
#> 1     0     14       0    0       1    0-0       group=1           0
#> 2     0     11       0    0       1    0-0       group=2           0
#> 3     2      4       0    0.71    0.88 0.35-0.88 group=1          10
#> 4     2      5       0    0.55    0.85 0.13-0.76 group=2           6
#> 5     4      2       0    0.71    0.88 0.35-0.88 group=1          10
#> 6     4      3       0    0.55    0.85 0.13-0.76 group=2           6
#> 7     6      1       0    0.71    0.88 0.35-0.88 group=1          10
#> 8     6      3       0    0.55    0.85 0.13-0.76 group=2           6

# removing default risk table of our survival plot
p1$table <- NULL

# creating a custom ggplot by our custom table
p1$table <- ggplot(custom_tbl, aes(time, strata)) +
  geom_text(aes(label = n.risk), size = 3, vjust=-0.8) +
  geom_text(aes(label = cum.n.event), size = 3, vjust=1) + 
  geom_text(aes(label = cum.inc), size = 3, vjust=2.8) + 
  geom_text(aes(label = ci), size = 3, vjust=4) + 
  labs(x = NULL, y = NULL) + theme_classic()+
  coord_cartesian(xlim = c(0, 7.5))+
  scale_x_continuous(breaks=seq(0, 9, 2))+
  scale_y_discrete(
    labels=c("group 1", "group 2"))+
  theme(axis.text.y = element_text(colour=c("red", "blue"), size = 12),
        axis.text.x = element_text(size = 12))
#> Warning: Vectorized input to `element_text()` is not officially supported.
#> ℹ Results may be unexpected or may change in future versions of ggplot2.

p1

创建于2023年2月13日,使用reprex v2.0.2

相关问题