使用R中的复杂规则从一个变量中解析出数据

plupiseo  于 2023-11-14  发布在  其他
关注(0)|答案(3)|浏览(103)

我从另一个源导入数据到R中(即,我不能轻易更改输入的格式/值)。
在变量中,有一个变量包括一个或多个这些可能的值:

  • 母亲(生母、养母、继母等)
  • 父亲(生父、养父、继父等)
  • 祖父母(亲生、寄养、继父母等)
  • 18岁以上的兄弟
  • 18岁以上的姐妹
  • 其他成年人(阿姨、叔叔等)

所有这些都在同一个“单元格”中,因此可能的数据看起来像:

样本输入 Dataframe (df)

df <- read.table(text =
"row lives.with.whom
  1  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.), Grandparent(s) (biological, foster, step, etc.), Brother(s) older than 18, Sister(s) older than 18, Other adults (aunts, uncles, etc.)'
  2  ''
  3  'Mother (biological mother, foster mother, step mother, etc.), Sister(s) older than 18'
  4  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.)'", header = T)

字符串
R中,我如何有效地创建规则来将这些响应解析到单独的列中,每种类型的家庭成员一列,以便输出如下所示:

输出 Dataframe 样本

mother <- c(1,0,1,1)
father <- c(1,0,0,1)
adult.brother <- c(1,0,0,0)
adult.sister <- c(1,0,1,0)
grandparent <- c(1,0,0,0)
other.adult <- c(1,0,0,0)
output.df <- cbind(mother, father, adult.brother, adult.sister, grandparent, other.adult)
colnames(output.df) <- c("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")
output.df

     Mother Father Brother Sister Grandparent Other adult
[1,]      1      1       1      1           1           1
[2,]      0      0       0      0           0           0
[3,]      1      0       0      1           0           0
[4,]      1      1       0      0           0           0

5anewei6

5anewei61#

下面是一个tidyverse选项,可以帮助您入门

library(tidyverse)
rel <- list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")
names(rel) <- unlist(rel)
bind_cols(df[, 1, drop = F], map(rel, ~+str_detect(tolower(df[, 2]), tolower(.x))))
#  row Mother Father Brother Sister Grandparent Other adult
#1   1      1      1       1      1           1           1
#2   2      0      0       0      0           0           0
#3   3      1      0       0      1           0           0
#4   4      1      1       0      0           0           0

字符串

示例数据

df <- read.table(text =
    "row lives.with.whom
  1  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.), Grandparent(s) (biological, foster, step, etc.), Brother(s) older than 18, Sister(s) older than 18, Other adults (aunts, uncles, etc.)'
  2  ''
  3  'Mother (biological mother, foster mother, step mother, etc.), Sister(s) older than 18'
  4  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.)'", header = T)

1qczuiv0

1qczuiv02#

试试这个:

rel<-list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")

for(i in 1:6){
  df$i<-if_else(grepl(rel[[i]],df$lives.with.whom),1,0)
  colnames(df)[i+2]<-rel[[i]]
}

字符串

vs91vp4v

vs91vp4v3#

我做了一些假设,并试图解决它。

library(tidyr)
library(dplyr)
# create nested lists with names of mothers and fathers for two ppl
mother <- list(list("bio_1","step_1","foster_1"), list("bio_2", "stp_2", "foster_2"))
father <- list(list("bio_1", "foster_1", "other_1"), list("bio_2", "stp_2", "foster_2"))

# convert to data frame
test_object <- data_frame(person = c(1,2),mother,father)

# print 
test_object

# A tibble: 2 x 3
  person mother     father    
   <dbl> <list>     <list>    
1      1 <list [3]> <list [3]>
2      2 <list [3]> <list [3]>

# first unnest the lists and get to the inner list
# then convert from wide to long form data
# do another unnnest to get the actual data in the long format
test_object %>%
  unnest(.) %>%
    gather(data = ., key = relationship, value = name, -person) %>%
      unnest() -> test_object
    
    test_object
# A tibble: 12 x 3
   person relationship name    
    <dbl> <chr>        <chr>   
 1      1 mother       bio_1   
 2      1 mother       step_1  
 3      1 mother       foster_1
 4      2 mother       bio_2   
 5      2 mother       stp_2   
 6      2 mother       foster_2
 7      1 father       bio_1   
 8      1 father       foster_1
 9      1 father       other_1 
10      2 father       bio_2   
11      2 father       stp_2   
12      2 father       foster_2

字符串
这里有tidyversedata.table的链接,它们包含了很多包和函数,可以解决大多数数据木工/争论问题。

相关问题