我尝试在azure上的databricks中使用spark\u apply并行运行一个包含for循环的函数。
我的职能是:
distribution <- function(sims){
for (p in 1:100){
increment_value <- list()
profiles <- list()
samples <- list()
sample_num <- list()
for (i in 1:length(samp_seq)){
w <- sample(sims, size=batch)
z <- sum(w)
name3 <- as.character(z)
samples[[name3]] <- data.frame(value = z)
}
}
}
当我把函数放在spark中时,应用如下:
sdf_len(sc,1) %>%
spark_apply(distribution)
我得到以下错误:
Error : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 305.0 failed 4 times, most recent failure: Lost task 0.3 in stage 305.0 (TID 297, 10.139.64.6, executor 0): java.lang.Exception: sparklyr worker rscript failure with status 255, check worker logs for details. Error : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 305.0 failed 4 times, most recent failure: Lost task 0.3 in stage 305.0 (TID 297, 10.139.64.6, executor 0): java.lang.Exception: sparklyr worker rscript failure with status 255, check worker logs for details.
暂无答案!
目前还没有任何答案,快来回答吧!