R语言 基于关键字从另一列创建列

fumotvh3  于 2023-02-27  发布在  其他
关注(0)|答案(1)|浏览(128)

根据以下数据,如何添加第三个Type列?医院类型将根据医院名称中的某些单词确定。

Word         Type
    Government   Government
    Govt         Government
    St Jude      Religious
    Catholic     Religious
    District     District
    Community    Community
    Divine Mercy Religious
    St. Luke     Religious
    St. Theresa  Religious
    Islamic      Religious
    Babtist      Religious

数据:

df = structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), 
    Hospital = c("A Government Hospital", "Government B Hospital", 
    "C Govt Hospital", "D St Jude Hospital", "D Catholic Hospital", 
    "Catholic E Hospital", "F District Hospital", "G Community Hospital", 
    "H Divine Mercy Hospital", "I St. Luke Hospital", "J St. Theresa Hospital", 
    "Babtist Hospital")), class = "data.frame", row.names = c(NA, 
-12L))

# Desired df
df_desired =     Hospital = c("A Governtment Hospital", "Goverment B Hospital", 
    "C Govt Hospital", "D St Jude Hospital", "D Catholic Hospital", 
    "Catholic E Hospital", "F District Hospital", "G Community Hospital", 
    "H Divine Mercy Hospital", "I St. Luke Hospital", "J St. Theresa Hospital", 
    "Babtist Hospital"), Type = c("Government", "Government", 
    "Religious", "Religious", "Religious", "Religious", "District", 
    "Community", "Religious", "Religious", "Religious", "Religious"
    )), class = "data.frame", row.names = c(NA, -12L))
hwamh0ep

hwamh0ep1#

如果我们有键/值数据集,可以使用fuzzyjoin中的regex_left_join

library(fuzzyjoin)
library(dplyr)
regex_left_join(df, keydat, by = c("Hospital" = "Word")) %>%   
  select(-Word)
  • 输出
id                Hospital       Type
1   1  A Governtment Hospital Government
2   2    Goverment B Hospital Government
3   3         C Govt Hospital Government
4   4      D St Jude Hospital  Religious
5   5     D Catholic Hospital  Religious
6   6     Catholic E Hospital  Religious
7   7     F District Hospital   District
8   8    G Community Hospital  Community
9   9 H Divine Mercy Hospital  Religious
10 10     I St. Luke Hospital  Religious
11 11  J St. Theresa Hospital  Religious
12 12        Babtist Hospital  Religious

数据

keydat <- structure(list(Word = c("Gover(nt)?ment", "Govt", "St Jude", 
"Catholic", "District", "Community", "Divine Mercy", "St. Luke", 
"St. Theresa", "Islamic", "Babtist"), Type = c("Government", 
"Government", "Religious", "Religious", "District", "Community", 
"Religious", "Religious", "Religious", "Religious", "Religious"
)), row.names = c(NA, -11L), class = "data.frame")

相关问题