如何在R中从一个开始日期开始,在一个跨越多个年份的日期向量中过滤连续的n天?

jqjz2hbq  于 2023-04-09  发布在  其他
关注(0)|答案(4)|浏览(86)

我被困在试图创建一个代码,从一个给定的开始日期开始,跨越多年的日期向量过滤连续5天。
示例:

v = seq.Date(as.Date("2000-01-01"), as.Date("2020-12-31"), 1)

initial_date = as.Date("01-01") #(mm:dd)

预期输出:

2000-01-01 #(YYYY:mm:dd)
2000-01-02
2000-01-03
2000-01-04
2000-01-05
2001-01-01
2001-01-02
2001-01-03
2001-01-04
2001-01-05
...
2020-01-01
2020-01-02
2020-01-03
2020-01-04
2020-01-05

例如,如果初始日期为12-31(mm:dd),则结果应为:

2000-12-31
2001-01-01
2001-01-02
2001-01-03
2001-01-04
2001-31-12
2002-01-01
2002-01-02
2002-01-03
2002-01-04
...
2019-12-31
2020-01-01
2020-01-02
2020-01-03
2020-01-04
2020-12-31

有什么建议吗?

rjee0c15

rjee0c151#

您可以使用str_detect()查找起始值,并使用索引查找接下来的4行。

library(dplyr)

inital_date = "12-31"

tibble(dates = v)|> 
  mutate(index = stringr::str_detect(v, "12-31")) |> 
  slice(sort(which(index) + rep(0:4, each = sum(index)))) 
      select(-index)

输出:

# A tibble: 97 × 1
   dates     
   <date>    
 1 2000-12-31
 2 2001-01-01
 3 2001-01-02
 4 2001-01-03
 5 2001-01-04
 6 2001-12-31
 7 2002-01-01
 8 2002-01-02
 9 2002-01-03
10 2002-01-04
jjjwad0x

jjjwad0x2#

n <- 5
v <- seq.Date(as.Date("2000-01-01"), as.Date("2020-12-31"), 1)
initial_date <- "12-31"
tmp <- rowSums(
  sapply(seq_len(n) - 1, function(z)
    (v-z) %in% v &
      format(v-z, format = "%m-%d") == initial_date)
  ) > 0
v[tmp]
#   [1] "2000-12-31" "2001-01-01" "2001-01-02" "2001-01-03" "2001-01-04" "2001-12-31" "2002-01-01" "2002-01-02" "2002-01-03"
#  [10] "2002-01-04" "2002-12-31" "2003-01-01" "2003-01-02" "2003-01-03" "2003-01-04" "2003-12-31" "2004-01-01" "2004-01-02"
#  [19] "2004-01-03" "2004-01-04" "2004-12-31" "2005-01-01" "2005-01-02" "2005-01-03" "2005-01-04" "2005-12-31" "2006-01-01"
#  [28] "2006-01-02" "2006-01-03" "2006-01-04" "2006-12-31" "2007-01-01" "2007-01-02" "2007-01-03" "2007-01-04" "2007-12-31"
#  [37] "2008-01-01" "2008-01-02" "2008-01-03" "2008-01-04" "2008-12-31" "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04"
#  [46] "2009-12-31" "2010-01-01" "2010-01-02" "2010-01-03" "2010-01-04" "2010-12-31" "2011-01-01" "2011-01-02" "2011-01-03"
#  [55] "2011-01-04" "2011-12-31" "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" "2012-12-31" "2013-01-01" "2013-01-02"
#  [64] "2013-01-03" "2013-01-04" "2013-12-31" "2014-01-01" "2014-01-02" "2014-01-03" "2014-01-04" "2014-12-31" "2015-01-01"
#  [73] "2015-01-02" "2015-01-03" "2015-01-04" "2015-12-31" "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04" "2016-12-31"
#  [82] "2017-01-01" "2017-01-02" "2017-01-03" "2017-01-04" "2017-12-31" "2018-01-01" "2018-01-02" "2018-01-03" "2018-01-04"
#  [91] "2018-12-31" "2019-01-01" "2019-01-02" "2019-01-03" "2019-01-04" "2019-12-31" "2020-01-01" "2020-01-02" "2020-01-03"
# [100] "2020-01-04" "2020-12-31"
  • (v-z) %in% v是必需的,因为否则它将“匹配”不在原始v中的1999-12-31
  • format(v-z, format = "%m-%d")检查z天前(对于v中的所有)是否与initial_date匹配
  • sapply的返回值是一个length(v)-行,5-列的逻辑矩阵,其中每行指示是否应该保留该行的第v(在initial_date的5天内)。为了确定一行中是否有任何为真,我们使用rowSums(.) > 0,它将返回一个logical向量,其长度与v相同。

注意事项:我尝试使用outer(v, 0:4,-)一次性完成这一点,但R中的矩阵不倾向于保留"Date"类,因此需要解决这个问题(虽然可以做到)有点麻烦。解决方案是生成一个不是类的矩阵作为日期,因此在sapply内进行比较。我试图尽可能减少“循环”的数量,因此迭代0:4而不是v

t9aqgxwy

t9aqgxwy3#

此函数使用strftime(format = "%m-%d")来比较日期,同时忽略年份,然后将0n - 1天添加到每个匹配的日期,然后将结果子集显示在原始日期向量中。

get_days <- function(dates, mmdd, n = 5) {
  stopifnot(n >= 1)
  out <- dates[strftime(dates, "%m-%d") == mmdd]
  out <- out + rep(0:(n - 1), each = length(out))
  sort(out[out %in% dates])
}

get_days(v, "12-31")
#  [1] "2000-12-31" "2001-01-01" "2001-01-02" "2001-01-03" "2001-01-04" "2001-12-31"
#  [7] "2002-01-01" "2002-01-02" "2002-01-03" "2002-01-04" "2002-12-31" "2003-01-01"
#  ...
# [91] "2018-12-31" "2019-01-01" "2019-01-02" "2019-01-03" "2019-01-04" "2019-12-31"
# [97] "2020-01-01"
kzmpq1sx

kzmpq1sx4#

试试看

library(lubridate)
library(dplyr)
library(stringr)
initial_date <- "12-31"
n <- 5
tibble(v) %>% 
  filter(cumsum(format(v, '%m-%d') == initial_date) >0) %>% 
  group_by(year = year(v)) %>% 
  slice_head(n = n) %>%
  ungroup
  • 输出
# A tibble: 101 × 2
   v           year
   <date>     <dbl>
 1 2000-12-31  2000
 2 2001-01-01  2001
 3 2001-01-02  2001
 4 2001-01-03  2001
 5 2001-01-04  2001
 6 2001-01-05  2001
 7 2002-01-01  2002
 8 2002-01-02  2002
 9 2002-01-03  2002
10 2002-01-04  2002
# … with 91 more rows

相关问题