R语言 使用基于字符的不同选项拆分名称

3duebb1j  于 2022-12-06  发布在  其他
关注(0)|答案(3)|浏览(140)

我有一个数据集,其中的名称输入不同。一些名称被输入为名字空间的姓氏,而其他名称被输入为姓氏逗号名字。我需要所有读取姓氏逗号名字。我想保留 Dataframe 内的数据,但如果没有其他方法可以这样做,我可以追加回来。以下是 Dataframe 的一个示例:
| 姓名|其他栏(_C)|
| - -|- -|
| 约翰·史密斯|一个人。|
| 山姆·米勒|一个人。|
| 安德森|一个人。|
| 威廉姆斯|一个人。|
| 苏珊·斯泰尔斯|一个人。|
| 大卫|一个人。|
我试过在 Dataframe 中使用管道后执行case_when语句,但没有成功。我也试过grep 1和str_split。

nlejzf6q

nlejzf6q1#

library(dplyr)
quux %>%
  mutate(
    Names = if_else(grepl(",", Names),
                    Names,
                    sub("^(.+)\\s+(\\S+)$", "\\2, \\1", Names))
  )
#             Names Other_Column
# 1     Smith, John          ...
# 2     Miller, Sam          ...
# 3   Anderson, Sam          ...
# 4 Williams, Jacob          ...
# 5   Styles, Susan          ...
# 6    Burke, David          ...

正则表达式:

^(.+)\\s+(\\S+)$
^                 beginning-of-string
 (^^)             group of anything (1-or-more)
     ^^^^         blank-space (1-or-more)
         (^^^^)   group of non-blank-space characters (1-or-more)
               ^  end-of-string

如果有逗号,则不会改变任何内容。如果没有逗号,则会使用最后一个“单词”(以空格分隔),并将其移动到前面并加上逗号。
数据类型

quux <- structure(list(Names = c("Smith, John", "Sam Miller", "Anderson, Sam", "Williams, Jacob", "Susan Styles", "Burke, David"), Other_Column = c("...", "...", "...", "...", "...", "...")), class = "data.frame", row.names = c(NA, -6L))
20jt8wwn

20jt8wwn2#

您还可以执行以下操作。

library(tidyverse)

df %>%
  separate(Names, into = c("first", "second"), remove = F) %>%
  transmute(Names = Names,
            new_names = case_when(str_detect(Names, ",") ~ Names,
                                  T ~ str_c(second, first, sep = ", ")))

# A tibble: 6 × 2
#   Names           new_names      
#   <chr>           <chr>          
# 1 Smith, John     Smith, John    
# 2 Sam Miller      Miller, Sam    
# 3 Anderson, Sam   Anderson, Sam  
# 4 Williams, Jacob Williams, Jacob
# 5 Susan Styles    Styles, Susan  
# 6 Burke, David    Burke, David

数据类型

df <- tibble(Names = c("Smith, John", "Sam Miller", "Anderson, Sam", "Williams, Jacob", "Susan Styles", "Burke, David"))
5cg8jx4n

5cg8jx4n3#

这可能有助于您:

df <-
tibble::tribble(
             ~Names, ~Other_Column,
      "Smith, John",         "...",
       "Sam Miller",         "...",
    "Anderson, Sam",         "...",
  "Williams, Jacob",         "...",
     "Susan Styles",         "...",
     "Burke, David",         "..."
  )

library(stringr)
library(dplyr)

change_name <- 
  function(x){
    if(!str_detect(x,",")){
      aux <- str_split(x,pattern = " ")[[1]]
      output <- str_c(aux[2],", ",aux[1])
    }else{
      output <- x
    }
    return(output)
  }

df %>% 
  rowwise() %>% 
  mutate(new_name = change_name(Names))

# A tibble: 6 x 3
# Rowwise: 
  Names           Other_Column new_name       
  <chr>           <chr>        <chr>          
1 Smith, John     ...          Smith, John    
2 Sam Miller      ...          Miller, Sam    
3 Anderson, Sam   ...          Anderson, Sam  
4 Williams, Jacob ...          Williams, Jacob
5 Susan Styles    ...          Styles, Susan  
6 Burke, David    ...          Burke, David

相关问题