R语言 时间序列上的循环函数适用于小df,但不适用于大df -错误:C栈的使用……太接近极限了

amrnrhlw  于 2023-05-26  发布在  其他
关注(0)|答案(2)|浏览(108)

我有一个dataframe与日期/时间(时间序列),地点(分组变量)和价值。我已经确定了不同的“浪涌”的开始时间-定义为在15分钟内>=2的值的变化。对于每一个浪涌时间,我尝试的日期/时间的值福尔斯到(或低于)的浪涌开始(即浪涌结束)。
我可以通过使用递归循环函数('find.next.smaller'来自这个问题-In a dataframe, find the index of the next smaller value for each element of a column)来实现这一点。这在较小的 Dataframe 上工作得很好,但不是一个大的。我收到错误消息“错误:C堆栈使用量15925584太接近极限”。看过其他类似的问题(例如,Error: C stack usage is too close to the limit),我不认为这是一个无限递归函数的问题,而是一个内存问题。但是我不知道如何使用shell(或powershell)来做到这一点。我想知道是否有其他方法?通过调整我的记忆或下面的功能?
一些示例代码:

###df formatting    
library(dplyr)
df <- data.frame("Date_time" =seq(from=as.POSIXct("2022-01-01 00:00") , by= 15*60, to=as.POSIXct("2022-01-01 07:00")), 
             "Site" = rep(c("Site A", "Site B"), each = 29),
             "Value" = c(10,10.1,10.2,10.3,12.5,14.8,12.4,11.3,10.3,10.1,10.2,10.5,10.4,10.3,14.7,10.1,
                         16.7,16.3,16.4,14.2,10.2,10.1,10.3,10.2,11.7,13.2,13.2,11.1,11.4,
                         rep(10.3,times=29)))
df <- df %>% group_by(Site) %>% mutate(Lead_Value = lead(Value))
df$Surge_start <- NA
df[which(df$Lead_Value - df$Value >=2),"Surge_start"] <- 
 paste("Surge",seq(1,length(which(df$Lead_Value - df$Value >=2)),1),sep="")

###Applying the 'find.next.smaller' function

find.next.smaller <- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] >= vec[-1])), 
     find.next.smaller(ini + 1, vec[-1]))
}       # the recursive function will go element by element through the vector and find out 
# the index of the next smaller value.
df$Date_time <- as.character(df$Date_time)
Output <- df %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
###This works fine

df2 <- do.call("rbind", replicate(1000, df, simplify = FALSE))
Output2 <- df2 %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
####This does not work
2uluyalo

2uluyalo1#

我建议你不需要递归。

find_nearest_value <- function(surge, time1, val1, times, vals) {
  if (!grepl("Surge", surge)) NA else times[times > time1 & vals <= val1][1]
}

Output %>%
  group_by(Site) %>%
  mutate(end2 = if_else(grepl("Surge", Surge_start), mapply(find_nearest_value, Surge_start, Date_time, Value, list(Date_time), list(Value)), NA)) %>%
  print(n=99)
# # A tibble: 58 × 7
# # Groups:   Site [2]
#    Date_time           Site   Value Lead_Value Surge_start Surge_end           end2               
#    <chr>               <chr>  <dbl>      <dbl> <chr>       <chr>               <chr>              
#  1 2022-01-01 00:00:00 Site A  10         10.1 NA          NA                  NA                 
#  2 2022-01-01 00:15:00 Site A  10.1       10.2 NA          NA                  NA                 
#  3 2022-01-01 00:30:00 Site A  10.2       10.3 NA          NA                  NA                 
#  4 2022-01-01 00:45:00 Site A  10.3       12.5 Surge1      2022-01-01 02:00:00 2022-01-01 02:00:00
#  5 2022-01-01 01:00:00 Site A  12.5       14.8 Surge2      2022-01-01 01:30:00 2022-01-01 01:30:00
#  6 2022-01-01 01:15:00 Site A  14.8       12.4 NA          NA                  NA                 
#  7 2022-01-01 01:30:00 Site A  12.4       11.3 NA          NA                  NA                 
#  8 2022-01-01 01:45:00 Site A  11.3       10.3 NA          NA                  NA                 
#  9 2022-01-01 02:00:00 Site A  10.3       10.1 NA          NA                  NA                 
# 10 2022-01-01 02:15:00 Site A  10.1       10.2 NA          NA                  NA                 
# 11 2022-01-01 02:30:00 Site A  10.2       10.5 NA          NA                  NA                 
# 12 2022-01-01 02:45:00 Site A  10.5       10.4 NA          NA                  NA                 
# 13 2022-01-01 03:00:00 Site A  10.4       10.3 NA          NA                  NA                 
# 14 2022-01-01 03:15:00 Site A  10.3       14.7 Surge3      2022-01-01 03:45:00 2022-01-01 03:45:00
# 15 2022-01-01 03:30:00 Site A  14.7       10.1 NA          NA                  NA                 
# 16 2022-01-01 03:45:00 Site A  10.1       16.7 Surge4      2022-01-01 05:15:00 2022-01-01 05:15:00
# 17 2022-01-01 04:00:00 Site A  16.7       16.3 NA          NA                  NA                 
# 18 2022-01-01 04:15:00 Site A  16.3       16.4 NA          NA                  NA                 
# 19 2022-01-01 04:30:00 Site A  16.4       14.2 NA          NA                  NA                 
# 20 2022-01-01 04:45:00 Site A  14.2       10.2 NA          NA                  NA                 
# 21 2022-01-01 05:00:00 Site A  10.2       10.1 NA          NA                  NA                 
# 22 2022-01-01 05:15:00 Site A  10.1       10.3 NA          NA                  NA                 
# 23 2022-01-01 05:30:00 Site A  10.3       10.2 NA          NA                  NA                 
# 24 2022-01-01 05:45:00 Site A  10.2       11.7 NA          NA                  NA                 
# 25 2022-01-01 06:00:00 Site A  11.7       13.2 NA          NA                  NA                 
# 26 2022-01-01 06:15:00 Site A  13.2       13.2 NA          NA                  NA                 
# 27 2022-01-01 06:30:00 Site A  13.2       11.1 NA          NA                  NA                 
# 28 2022-01-01 06:45:00 Site A  11.1       11.4 NA          NA                  NA                 
# 29 2022-01-01 07:00:00 Site A  11.4       NA   NA          NA                  NA                 
# 30 2022-01-01 00:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 31 2022-01-01 00:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 32 2022-01-01 00:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 33 2022-01-01 00:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 34 2022-01-01 01:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 35 2022-01-01 01:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 36 2022-01-01 01:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 37 2022-01-01 01:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 38 2022-01-01 02:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 39 2022-01-01 02:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 40 2022-01-01 02:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 41 2022-01-01 02:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 42 2022-01-01 03:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 43 2022-01-01 03:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 44 2022-01-01 03:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 45 2022-01-01 03:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 46 2022-01-01 04:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 47 2022-01-01 04:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 48 2022-01-01 04:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 49 2022-01-01 04:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 50 2022-01-01 05:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 51 2022-01-01 05:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 52 2022-01-01 05:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 53 2022-01-01 05:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 54 2022-01-01 06:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 55 2022-01-01 06:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 56 2022-01-01 06:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 57 2022-01-01 06:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 58 2022-01-01 07:00:00 Site B  10.3       NA   NA          NA                  NA
mdfafbf1

mdfafbf12#

可能递归使用了太多的内存,你可能更喜欢向量化/循环的方法,即使它需要更长的时间。下面我对你的功能做了一些修改,并创建了一些选项。

部分选项

原件:

find.next.smaller_rec <- function(ini = 1, vec) {
  if(length(vec) == 1) NA 
  else c(ini + min(which(vec[1] >= vec[-1])), 
         find.next.smaller_rec(ini + 1, vec[-1]))
}

矢量化的构建块:

find.next.smaller <- function(val, vec) {
  if(val == length(vec)) NA  else val + min(which(vec[val] >= vec[-(1:val)]))
}

使用for循环:

find.next.smaller_for <- function(x, vec){
  result <- numeric(x)
  for(val in 1:x){
    result[val] <- find.next.smaller(val, vec)
  }
  result
}

使用Vectorize()

find.next.smaller_vec <- Vectorize(find.next.smaller, "val")

purrr::map

find.next.smaller_map <- function(x, vec){
  map_dbl(1:x, ~ find.next.smaller(val = .x, vec = vec))
}

对比:

bench <- bench::mark(find.next.smaller_rec(1, df$Value),
                     find.next.smaller_for(nrow(df), df$Value),
                     find.next.smaller_vec(1:nrow(df), df$Value),
                     find.next.smaller_map(nrow(df), df$Value),
                     min_time = 2)

bench %>% select(c(median, mem_alloc, n_gc, `gc/sec`))

    median mem_alloc  n_gc `gc/sec`
  <bch:tm> <bch:byt> <dbl>    <dbl>
1    496µs    92.4KB    13     7.30
2    582µs    77.1KB    10     5.46
3    612µs    78.7KB    10     5.97
4    681µs    77.1KB    10     5.40

我们可以看到,即使递归更快,它也会使用更多的内存,这可能是你出错的原因。
可能还有更好的选择,我只是想提出一些类似于你原来的选择。

应用到问题中

Output <- df %>%
  group_by(Site) %>%
  mutate(Surge_end = ifelse(grepl("Surge",Surge_start),
                            Date_time[find.next.smaller_for(n(), Value)],
                            NA_character_))

您还可以使用Date_time[find.next.smaller_map(n(), Value)]Date_time[find.next.smaller_vec(1:n(), Value)]

相关问题