用于返回给定因变量列表或向量的生存率(survfit)和Kaplan-meier(ggsurvplot)的函数

fdx2calv  于 2023-02-17  发布在  其他
关注(0)|答案(1)|浏览(130)

给定R中的一个 Dataframe ,其中不同列可以作为因变量,我尝试创建一个函数来接收 Dataframe “df”,列表或向量,因变量为“vars”,时间变量为“time”,状态变量为“status”,使用“survfit”返回生存结果,使用ggsurvplot返回Kaplan-meier曲线。
这样做的目的是避免过多的复制和粘贴代码。
以下面的数据为例:

library(ggplot2)
library(survival)
library("dplyr")

df <- lung %>%
  transmute(time,
            status,  # censoring status 1=censored, 2=dead
            Age = age,
            Sex = factor(sex, labels = c("Male", "Female")),
            ECOG = factor(lung$ph.ecog),
            `Meal Cal` = as.numeric(meal.cal))

# help(lung)

# Turn status into (0=censored, 1=dead)
df$status <- ifelse(df$status == 2, 1, 0)

我当然可以做这样的生存分析:

fit <- survfit(Surv(time, status) ~ ECOG, data = df)

ggsurvplot(fit,
           pval = TRUE, pval.coord = c(750, 0.3), 
           conf.int = FALSE, 
           surv.median.line = "hv", 
           legend = c(0.8, 0.6), 
           legend.title = "",
           risk.table = "absolute", 
           risk.table.y.text = FALSE,  
           xlab = "Time (days)", ylab = "Survival", 
           palette="jco",
           title="Overall Survival", font.title = c(16, "bold", "black"), 
)

然而,如果我想对Sex做同样的事情,我必须再次复制和粘贴所有的东西,所以我想在R中创建一个函数,它将数据框“df”、因变量列表“vars”、时间变量“time”和状态变量“status”作为输入,并使用“survfit”返回生存结果,使用“ggsurvplot”返回Kaplan-Meier曲线,如下所示:

vars <- c("ECOG", "Sex")

surv_plot_func <- function(df, vars, time, status) {
  results_list <- lapply(vars, function(var, time, status) {
    
    # Fit a survival model
    fit <- survfit(Surv(as.numeric(df[[time]]), as.logical(df[[status]])) ~ as.factor(df[[var]]), data = df)
    
    # Plot the Kaplan-Meier curve using ggsurvplot
    ggsurv <- ggsurvplot(fit, pval = TRUE, conf.int = TRUE,
                         risk.table = TRUE, legend.title = "",
                         surv.median.line = "hv", xlab = "Time", ylab = "Survival Probability")
    
    # Return the fit and ggsurv as a list
    list(fit = fit, ggsurv = ggsurv)
  })
  
  # Return the list of results
  results_list
}

res_list <- surv_plot_func(df, vars, "time", "status")

但是,没有成功。有什么办法吗?

qoefvg9y

qoefvg9y1#

下面的代码对我有效。

surv_plot_func <- function(df, vars, time, status) {
  results_list <- lapply(vars, function(var, time, status){
    
    # Creating a formula as a string
    form <- paste0('Surv(time, status)~',var)
    
    # Fit a survival model
    fit <- survfit(as.formula(form), data=df)
    
    # Plot the Kaplan-Meier curve using ggsurvplot
    ggsurv <- ggsurvplot(fit, pval = TRUE, conf.int = TRUE,
                         risk.table = TRUE, legend.title = "",
                         surv.median.line = "hv", xlab = "Time", ylab = "Survival Probability")
    
    # Return the fit and ggsurv as a list
    list(fit = fit, ggsurv = ggsurv)
    
  })
  
  # Return the list of results
  return(results_list)
}

相关问题