如何在R中用另一个 Dataframe 替换一个 Dataframe

t30tvxxf  于 2023-04-09  发布在  其他
关注(0)|答案(2)|浏览(106)

我想用df2替换df1数据,其中df2是类似df1的数据示例

df1 <- data.frame(
  name = c(
    "A. MAHJUM-61365",
    "A. MAHJUM-61365. MAHJUM-61365",
    "A. RIZAL. AD-11002795",
    "A. RIZAL. AD-11002795. RIZAL. AD-11002795",
    "ABD. KADIR-60447",
    "ABD. KADIR-60447ABD. KADIR-60447",
    "ABD. KAHAR-62551",
    "ABD. RASYID DS-11002082",
    "ABDREAS APUNG @SANY",
    "ABDUL AZIS @HYUNDAY",
    "ABDUL AZIZ @HYUNDAI",
    "ABDUL AZIZ@HYUNDAI"
  ))

并且df2是

df2 <- data.frame(
  name = c(
    "A. MAHJUM-61365",
    "A. RIZAL. AD-11002795",
    "ABD. KADIR-60447",
    "ABD. KAHAR-62551",
    "ABD. RASYID DS-11002082",
    "ABDREAS APUNG @SANY",
    "ABDUL AZIS @HYUNDAY"
  ))

如果df1看起来像df2,则df1将被替换为df2

icnyk63a

icnyk63a1#

由于是子串匹配,我们可以使用fuzzyjoin

library(dplyr)
library(fuzzyjoin)
regex_left_join(df1, df2, by = 'name') %>% 
  transmute(name = coalesce(name.y, name.x))

或者使用基于距离的方法

stringdist_left_join(df1, df2, by = 'name') %>% 
   transmute(name = coalesce(name.y, name.x))
xn1cxnb4

xn1cxnb42#

您可以使用adist查找最佳匹配并替换它们。

i <- max.col(-adist(df1$name, df2$name, partial=TRUE))
df1$name <- df2$name[i]

df1
#                      name
#1          A. MAHJUM-61365
#2          A. MAHJUM-61365
#3    A. RIZAL. AD-11002795
#4    A. RIZAL. AD-11002795
#5         ABD. KADIR-60447
#6         ABD. KADIR-60447
#7         ABD. KAHAR-62551
#8  ABD. RASYID DS-11002082
#9      ABDREAS APUNG @SANY
#10     ABDUL AZIS @HYUNDAY
#11     ABDUL AZIS @HYUNDAY
#12     ABDUL AZIS @HYUNDAY

相关问题