为R中包含County的所有行创建伪变量

hwamh0ep  于 2023-01-18  发布在  其他
关注(0)|答案(3)|浏览(130)

使用R给出以下等式:

County_or_City <- c("Butte County", "Oroville", "Solano Cnty", "Redding", "Maripossa county")
data.frame(County_or_City)

    County_or_City
1     Butte County
2         Oroville
3      Solano Cnty
4          Redding
5 Maripossa county

我想创建一个新列,其中包含一个哑变量,用于包含县、县或县的行。对不起,我知道这是非常基本的,但我正在学习。我该怎么做???

rjzwgtxy

rjzwgtxy1#

使用base R

transform(data.frame(County_or_City), 
 dummy = grepl('C(ou)?nty', County_or_City, ignore.case = TRUE))
  • 输出
County_or_City dummy
1     Butte County  TRUE
2         Oroville FALSE
3      Solano Cnty  TRUE
4          Redding FALSE
5 Maripossa county  TRUE
8qgya5xd

8qgya5xd2#

编号

library(dplyr)
library(stringr)

county_words <- c("County","county","Cnty")

data.frame(County_or_City) %>% 
  mutate(dummy = str_detect(County_or_City,county_words))

输出

County_or_City dummy
1     Butte County  TRUE
2         Oroville FALSE
3      Solano Cnty  TRUE
4          Redding FALSE
5 Maripossa county  TRUE
c8ib6hqw

c8ib6hqw3#

在基数R中,可以使用grepl(搜索字符串中的模式,并返回布尔值TRUE/FALSE)和paste,并指定collapse = "|"(表示搜索这个"或"那个术语)来搜索术语,并为每个县返回布尔值(TRUE/FALSE),然后添加* 1,将其转换为二分虚拟变量(0 = FALSE/1 = TRUE):

County_or_City <- c("Butte County", "Oroville", "Solano Cnty", "Redding", "Maripossa county")
df <- data.frame(County_or_City)

srchtrms <- c("County","county","Cnty")

df$new <- grepl(paste(srchtrms, collapse = "|"), df$County_or_City) * 1
df

输出:

County_or_City new
1     Butte County   1
2         Oroville   0
3      Solano Cnty   1
4          Redding   0
5 Maripossa county   1

相关问题