R语言 日期的条件cbind()

xfb7svmp  于 2023-04-03  发布在  其他
关注(0)|答案(3)|浏览(125)

假设我们有2个 Dataframe ,每个 Dataframe 有2个列和6行,我们只想在左边的日期(lhs)比右边的日期(rhs)旧的时候才绑定这两个 Dataframe ,同时确保每行都没有重复的日期(在lhs和rhs中):例如..

x = cbind(data.frame(lhs_date = seq(Sys.Date()-5, Sys.Date(),2)), letter=c("A","B","C","D","E","F") )
Y = cbind(data.frame(rhs_date = seq(Sys.Date()-5, Sys.Date(),1)), letter=c("X","Y","Y","X","J","J") )

我们怎样才能只在lhs date〈rhs date的情况下cbind或left join x到y,并保持每一行的唯一性?

t3irkdon

t3irkdon1#

我不确定我是否理解了你的问题,但你在寻找这样的东西吗?

# packages
library(fuzzyjoin)

# data
x <- data.frame(lhs_date = seq(Sys.Date() - 5, Sys.Date(), 2), letter = c("A","B","C","D","E","F"))
y <- data.frame(rhs_date = seq(Sys.Date() - 5, Sys.Date(), 1), letter = c("X","Y","Y","X","J","J"))

# left join
fuzzy_left_join(
  x = x, 
  y = y, 
  by = c("lhs_date" = "rhs_date"),
  match_fun = list(`<`)
)
#>      lhs_date letter.x   rhs_date letter.y
#> 1  2019-11-21        A 2019-11-22        Y
#> 2  2019-11-21        A 2019-11-23        Y
#> 3  2019-11-21        A 2019-11-24        X
#> 4  2019-11-21        A 2019-11-25        J
#> 5  2019-11-21        A 2019-11-26        J
#> 6  2019-11-23        B 2019-11-24        X
#> 7  2019-11-23        B 2019-11-25        J
#> 8  2019-11-23        B 2019-11-26        J
#> 9  2019-11-25        C 2019-11-26        J
#> 10 2019-11-21        D 2019-11-22        Y
#> 11 2019-11-21        D 2019-11-23        Y
#> 12 2019-11-21        D 2019-11-24        X
#> 13 2019-11-21        D 2019-11-25        J
#> 14 2019-11-21        D 2019-11-26        J
#> 15 2019-11-23        E 2019-11-24        X
#> 16 2019-11-23        E 2019-11-25        J
#> 17 2019-11-23        E 2019-11-26        J
#> 18 2019-11-25        F 2019-11-26        J

创建于2019-11-26由reprex package(v0.3.0)

2nc8po8w

2nc8po8w2#

我找到的解决方案是基于最初的agila输入:在模糊连接dplyr之后,管道操作符可以完成其余的操作:

x <- data.frame(lhs_date = seq(Sys.Date() - 5, Sys.Date(), 2), letter = c("A","B","C","D","E","F"))
y <- data.frame(rhs_date = seq(Sys.Date() - 5, Sys.Date(), 1), letter = c("X","Y","Y","X","J","J"))

z= fuzzy_left_join(
  x = x, 
  y = y, 
  by = c("lhs_date" = "rhs_date"),
  match_fun = list(`<`)
)

z %>%
  group_by(lhs_date) %>%
  mutate(flag = row_number()) %>%
  filter(flag ==1)

我可以很容易地在SQL中复制,但与R.谢谢@Agila.虽然不完整,你的答案指向正确的方向,并在很大程度上

bvn4nwqk

bvn4nwqk3#

由于'dplyr'现在在变异连接中获得了joined_by参数,因此我们现在可以在dplyr中执行此操作

x = cbind(data.frame(lhs_date = seq(Sys.Date()-5, Sys.Date(),2)), letter=c("A","B","C","D","E","F") )
Y = cbind(data.frame(rhs_date = seq(Sys.Date()-5, Sys.Date(),1)), letter=c("X","Y","Y","X","J","J") )

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

x |>
  left_join(Y, by = join_by(closest(lhs_date < rhs_date)))
#>     lhs_date letter.x   rhs_date letter.y
#> 1 2023-03-24        A 2023-03-25        Y
#> 2 2023-03-26        B 2023-03-27        X
#> 3 2023-03-28        C 2023-03-29        J
#> 4 2023-03-24        D 2023-03-25        Y
#> 5 2023-03-26        E 2023-03-27        X
#> 6 2023-03-28        F 2023-03-29        J

创建于2023-03-29带有reprex v2.0.2

相关问题