R语言 根据日期Maptibble,具体取决于日期范围

rmbxnbpk  于 2023-02-01  发布在  其他
关注(0)|答案(2)|浏览(136)

我有两个tibbles。一个是股票和相关日期的列表。另一个是股票和他们的部门的列表,其中部门是定义在2个日期之间。

# stocks IDs and an associated date
stocks<-
  tibble(
    StockId = c(1, 1, 1, 2, 2),
    Date = c(
      as.Date("1998-01-05"),
      as.Date("2001-07-01"),
      as.Date("2015-01-01"),
      as.Date("1999-05-01"),
      as.Date("2003-02-02")
    )
  )

# stock IDs and their sector.
# For each stock, the sector has a start and end date as stocks change sector over time
sectors <- tibble(StockId = c(1, 1, 2, 2),
  SectorId = c(5050, 2540, 5510, 1010),
  StartDate = c(
    as.Date("1980-01-05"),
    as.Date("2000-07-01"),
    as.Date("1999-01-01"),
    as.Date("2010-01-02")
  ),
  EndDate = c(
    as.Date("2000-06-30"),
    as.Date("9999-12-31"),
    as.Date("2010-01-01"),
    as.Date("9999-01-01")
  )
)

我想得到一个最后的tibble,我有股票,日期和该特定日期的部门。在这种情况下,结果应该是:

result <-
  tibble(
    StockId = c(1, 1, 1, 2, 2),
    Date = c(
      as.Date("1998-01-05"),
      as.Date("2001-07-01"),
      as.Date("2015-01-01"),
      as.Date("1999-05-01"),
      as.Date("2003-02-02")
    ),
    SectorId = c(5050, 2540, 2540, 5510, 5510)
  )

注意,for循环在这里效率不高,因为我的tibble比这个例子有更多的数据。

bvpmtnay

bvpmtnay1#

library(fuzzyjoin)

 fuzzy_left_join(stocks, sectors,
                   by = c("StockId" = "StockId",
                          "Date" =  "StartDate",
                          "Date" = "EndDate"), 
                   match_fun = list(`==`, `>=`, `<=`)) %>% 
    dplyr::select(StockID = StockId.x, Date, SectorId)

输出

# A tibble: 5 x 3
  StockID Date       SectorId
    <dbl> <date>        <dbl>
1       1 1998-01-05     5050
2       1 2001-07-01     2540
3       1 2015-01-01     2540
4       2 1999-05-01     5510
5       2 2003-02-02     5510
w46czmvw

w46czmvw2#

您还可以使用dplyr,它包含一个用于重叠连接的join_by函数(从1.1.0开始):

#devtools::install_github("tidyverse/dplyr")
library(dplyr)

inner_join(stocks, sectors, by = join_by(StockId, between(Date, StartDate, EndDate)))
# A tibble: 5 × 5
  StockId Date       SectorId StartDate  EndDate   
    <dbl> <date>        <dbl> <date>     <date>    
1       1 1998-01-05     5050 1980-01-05 2000-06-30
2       1 2001-07-01     2540 2000-07-01 9999-12-31
3       1 2015-01-01     2540 2000-07-01 9999-12-31
4       2 1999-05-01     5510 1999-01-01 2010-01-01
5       2 2003-02-02     5510 1999-01-01 2010-01-01

以及data.table选项:
一个二个一个一个

相关问题