基于右侧 Dataframe 中的最近值在R中左连接 Dataframe

xpszyzbs  于 2023-04-09  发布在  其他
关注(0)|答案(1)|浏览(79)
t1 <- data.frame(
    team = c('a', 'b', 'c', 'd', 'e'),
    value1 = c(0.285, 0.37, 0.45, 0.42, 0.385),
    value2 = c(41, 51, 55, 61, 64)
)
  
pctiles = data.frame(
    pctile = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    value1 = c(0.253, 0.291, 0.325, 0.336, 0.345, 0.35, 0.367, 0.39, 0.4, 0.41, 0.435),
    value2 = c(35, 50, 54, 57, 59, 61, 62, 65, 71, 81, 95)
)

我们需要基于pctiles Dataframe 中的值最接近t1 Dataframe 中的值,将pctile值从pctiles Dataframe 连接到t1 Dataframe 。例如,在t1中,队A具有值1为0.285,其与pctiles$value1中的0.291最接近。因此,a value1pctile of 1应该被连接到t1上。通过这种逻辑,连接每个值的最接近的pctile,我们要得到的输出将是:

output_df <- data.frame(
    team = c('a', 'b', 'c', 'd', 'e'),
    value1 = c(0.285, 0.37, 0.45, 0.43, 0.385),
    value2 = c(41, 51, 56, 61, 64),
    value1pctile = c(1, 6, 10, 9, 7),
    value2pctile = c(0, 1, 3, 5, 7)
)

我们并不担心当t1中的值正好福尔斯在pctiles中的两个值之间时如何处理。无论是更高还是更低的pctile值都很好。我们如何在R中实现这一点?

**编辑:**我们正在尝试使用fuzzy_left_join,但遇到错误:

t1 <- fuzzyjoin::fuzzy_left_join(
  t1, pctiles, 
  by = c("value1" = "value1"), 
  match_fun = "min_diff",
  distance_col = "dist"
)

> Error in which(m) : argument to 'which' is not logical

似乎我们错误地使用了match_fundistance_col

0pizxfdo

0pizxfdo1#

data.table

library(data.table)
setDT(t1)
setDT(pctiles)

t1[, value1pctile := pctiles[.SD, on = "value1", roll = "nearest", pctile]]
t1[, value2pctile := pctiles[.SD, on = "value2", roll = "nearest", pctile]]

#      team value1 value2 value1pctile value2pctile
#    <char>  <num>  <num>        <num>        <num>
# 1:      a  0.285     41            1            0
# 2:      b  0.370     51            6            1
# 3:      c  0.450     55           10            2
# 4:      d  0.420     61            9            5
# 5:      e  0.385     64            7            7

相关问题