R数据.表更新函数内引用连接

j2qf4p5b  于 2023-06-27  发布在  其他
关注(0)|答案(2)|浏览(92)

我想在一个函数中更新连接两个表。下面是一个不使用函数的例子:

library(data.table)
Xtest <- data.table(a = rnorm(20), b = rnorm(20), c = 1:20)
Ytest <- data.table(c = 1:10, d = rnorm(10))

Xtest[Ytest, on = .(c), newcol := i.d]

# > Xtest[Ytest, on = .(c), newcol := i.d]
# > Xtest
# a           b  c      newcol
# 1: -1.68473343 -0.74498296  1  0.35096663
# 2: -0.98461614  2.15317525  2 -1.33890396
# 3: -1.65427602  1.21183896  3  1.49641480
# 4: -0.65045253 -0.74609860  4 -0.03227097
# 5:  1.49058508  1.20315276  5  1.41580186
# 6: -0.31631871  0.68716871  6 -0.03671959
# 7:  1.35923085 -0.20082238  7 -2.27959124
# 8: -0.75649545  0.24058212  8  0.93770862
# 9:  0.22452260 -0.28212892  9 -0.02500419
# 10:  0.30209786  1.33697797 10  0.67729741
# 11:  0.88748221 -0.54421418 11          NA
# 12:  0.47207422 -0.28159382 12          NA
# 13: -1.17270475  0.83940750 13          NA
# 14: -2.02787820 -0.03672582 14          NA
# 15: -0.22187761  0.59137210 15          NA
# 16:  0.97750330 -0.27030756 16          NA
# 17:  0.22725940  0.54617488 17          NA
# 18:  0.94065525 -0.23482152 18          NA
# 19:  2.12049977  0.69920776 19          NA
# 20:  0.06192823  0.12262739 20          NA

# Xtest[, newcol := NULL]

我试图将上面的代码转换为一个函数,但Ycol参数似乎被隐藏了:

myjoinfunction <- function(X, Y, joinlist, newcol, Ycol) {
  eval(substitute(X[Y, on = joinlist, newcol:=i.Ycol])) 
}

# > myjoinfunction <- function(X, Y, joinlist, newcol, Ycol) {
#   +   eval(substitute(X[Y, on = joinlist, newcol:=i.Ycol])) 
#   + }
# > myjoinfunction(Xtest, Ytest, list(c), D, d)
# Error in eval(jsub, SDenv, parent.frame()) : object 'i.Ycol' not found
# 8.
# eval(jsub, SDenv, parent.frame())
# 7.
# eval(jsub, SDenv, parent.frame())
# 6.
# `[.data.table`(Xtest, Ytest, on = list(c), `:=`(D, i.Ycol))
# 5.
# Xtest[Ytest, on = list(c), `:=`(D, i.Ycol)]
# 4.
# eval(substitute(X[Y, on = joinlist, `:=`(newcol, i.Ycol)]))
# 3.
# eval(substitute(X[Y, on = joinlist, `:=`(newcol, i.Ycol)]))
# 2.
# eval(substitute(X[Y, on = joinlist, `:=`(newcol, i.Ycol)]))
# 1.
# myjoinfunction(Xtest, Ytest, list(c), D, d)

如何使Ycol参数在函数中可见?

更新示例

为了更一般化,这里有一个更新连接的新例子,我想为它写一个函数:

Xtest <- data.table(
a = rnorm(20), 
b = rnorm(20), 
c = rep(letters[1:4], rep(5, 4)), 
d = rep(1:5, 4)
)
Ytest <- data.table(
c = rep(letters[1:2], rep(5, 2)), 
d = rep(1:5, 2), 
e = rnorm(10), 
f = rnorm(10)
)

# > Xtest[Ytest, on = .(c, d), `:=`(newcol1 = i.e, newcol2 = i.f) ]
# > Xtest
# a            b c d    newcol1    newcol2
# 1: -2.4939743 -0.200370619 a 1 -1.4934893 -1.0288955
# 2:  1.0188321 -1.182286508 a 2  1.3811712  0.9747131
# 3:  0.5217161 -0.152117649 a 3 -0.4168069  0.1218213
# 4: -0.1584167  0.583640353 a 4  0.4644738  1.7888567
# 5: -0.4271398  0.020067301 a 5  2.5279998  2.0919953
# 6: -1.7692909  0.250129040 b 1 -1.5964246 -1.0884861
# 7: -0.8899915  0.971742055 b 2  0.3011304  1.2629524
# 8: -0.4490363 -1.540005621 b 3 -0.7992208 -0.5155775
# 9: -0.5706488 -1.037077614 b 4  1.0058213  1.9787692
# 10: -0.0922679  1.444487848 b 5 -0.2893311 -0.6095043
# 11:  0.9924810 -1.144513228 c 1         NA         NA
# 12:  1.2232591  1.503649791 c 2         NA         NA
# 13:  0.8751961  0.892765910 c 3         NA         NA
# 14:  0.9960554  0.499310073 c 4         NA         NA
# 15: -0.6184695  1.867985589 c 5         NA         NA
# 16:  0.6503936  0.422683211 d 1         NA         NA
# 17: -0.6160834 -1.585713893 d 2         NA         NA
# 18:  1.5949931 -0.544704857 d 3         NA         NA
# 19:  0.7232079 -0.006460518 d 4         NA         NA
# 20: -0.2824961  0.119585859 d 5         NA         NA

当joinlist是字符串列表时,使用大卫在评论中的建议会出错

myjoinfunction1 <- function(X, Y, joinlist, newcol, Ycol) X[Y, on = joinlist, newcol:= get(paste0("i.", Ycol))]

# > myjoinfunction1(Xtest, Ytest, list("c", "d"), "newcol", "e")
# Error in .parse_on(substitute(on), isnull_inames) : 
#   'on' argument should be a named atomic vector of column names indicating which columns in 'i' should be joined with which columns in 'x'.
# 5.
# stop("'on' argument should be a named atomic vector of column names indicating which columns in 'i' should be joined with which columns in 'x'.")
# 4.
# .parse_on(substitute(on), isnull_inames)
# 3.
# `[.data.table`(X, Y, on = joinlist, `:=`(newcol, get(paste0("i.", 
#                                                             Ycol))))
# 2.
# X[Y, on = joinlist, `:=`(newcol, get(paste0("i.", Ycol)))]
# 1.
# myjoinfunction1(Xtest, Ytest, list("c", "d"), "newcol", "e")
6jjcrrmo

6jjcrrmo1#

对于更一般的第二示例:

f <- function(X, Y, joinlist, cols) {
    X[Y, on = joinlist, names(cols) := mget(sprintf("i.%s", cols))]
}

用途:

set.seed(1)
Xtest2 <- data.table(
a = rnorm(20), 
b = rnorm(20), 
c = rep(letters[1:4], rep(5, 4)), 
d = rep(1:5, 4)
)
Ytest2 <- data.table(
c = rep(letters[1:2], rep(5, 2)), 
d = rep(1:5, 2), 
e = rnorm(10), 
f = rnorm(10)
)

f(Xtest2, Ytest2, c("c", "d"), c(newcol1 = "e", newcol2 = "f"))

边注:在data.table(running out of "column slots"; adding to a table loaded from disk

ohfgkhjo

ohfgkhjo2#

以下是我基于评论的回答。evalsubstitute可以应用于data.table中的不同位置,而不是将整个data.table Package 在eval(substitute(DT))中。此外,当使用substitute和字符串而不是表达式时,不需要eval

myjoinfunction <- function(X, Y, joinlist, newcol, Ycol) {
  Ycol <- paste0("i.", as.character(substitute(Ycol))[-1])
  X[
     Y, 
     on = eval(substitute(joinlist)), 
     as.character(substitute(newcol))[-1] := mget(Ycol)
   d] 
  }

用途:

set.seed(42)
Xtest <- data.table(
a = rnorm(20), 
b = rnorm(20), 
c = rep(letters[1:4], rep(5, 4)), 
d = rep(1:5, 4), 
id = 1:20
)
Ytest <- data.table(
c = rep(letters[1:2], rep(5, 2)), 
d = rep(1:5, 2), 
e = rnorm(10), 
f = rnorm(10), 
id = 1:10
)

# multiple columns supported both for joining and updating
myjoinfunction(Xtest, Ytest, list(c, d), list(newcol1, newcol2), list(e, f));

# Single columns have to be expressed in a list
myjoinfunction(Xtest, Ytest, list(id), list(newcol1), list(e));

由于参数不是字符串,因此这比编程使用更适合于交互式使用。
欢迎提出改进建议!特别是如果有一种更优雅的方式来表达函数参数中的单个列或处理函数中的列表(我不喜欢在处理列表时必须使用[-1]作为函数中的解决方案,我认为这最终会导致错误。

相关问题