R语言 从列表列中提取第一个和最后一个字符向量:右

osh3o9ms  于 2023-01-06  发布在  其他
关注(0)|答案(2)|浏览(214)

我在实现用户提供的类似问题的解决方案时遇到了困难。请看下面的df示例。

structure(list(FirstName = c("Albus Percival Wulfric Brian Dumbledore", 
"Harry James Potter", "Tom Marvollo Riddle", "Lord Voldemort"
), Email = c("albusD@hogwarts.com", "harryP@hogwarts.com", "tomR@hogwarts.com", 
"LV@Wiz.com"), ClassSection = c("HeadMaster", "Student", "Dark Lord in training", 
"Dark Lord")), row.names = c(NA, -4L), spec = structure(list(
    cols = list(FirstName = structure(list(), class = c("collector_character", 
    "collector")), Email = structure(list(), class = c("collector_character", 
    "collector")), ClassSection = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

我想创建一个新的列,名字和姓氏都在其中。为此,我首先尝试了separate(FirstName, sep = " ", into("First", "Middle", Last")。然而,发生的事情是,有其他单词元素被遗漏。所以,我不能有效地将它们组合在一起。
接下来,我尝试了df%>% mutate(First = str_split(FirstName, pattern = " "))。这给出了一个元素列表。我想找到一种方法来提取该列的第一个和最后一个元素。

# A tibble: 4 x 4
  FirstName                               Email               ClassSection          First    
  <chr>                                   <chr>               <chr>                 <list>   
1 Albus Percival Wulfric Brian Dumbledore albusD@hogwarts.com HeadMaster            <chr [4]>
2 Harry James Potter                      harryP@hogwarts.com Student               <chr [3]>
3 Tom Marvollo Riddle                     tomR@hogwarts.com   Dark Lord in training <chr [3]>
4 Lord Voldemort                          LV@Wiz.com          Dark Lord             <chr [2]>

我看了各种各样的答案,其中tail(First, n=1)和dplyr的last(First)被建议。然而,这些都没有给予我正确的答案。我也尝试了unnest_wider(First),但它有同样的问题separate(firstName)。也就是说,我看到多列。现在这些不工作的名称,只有两个或三个以上的单词。
我想继续dplyr(tidyverse)的工作流程,有没有办法把第一个和最后一个向量合并成一个新的列?

mrwjdhj3

mrwjdhj31#

你是说像这样的东西吗?

df %>%
  mutate(
    FirstLast = sapply(str_split(FirstName, pattern = " "),
                       \(z) paste(z[unique(c(1, length(z)))], collapse = ""))
  )
# # A tibble: 4 × 4
#   FirstName                               Email               ClassSection          FirstLast      
#   <chr>                                   <chr>               <chr>                 <chr>          
# 1 Albus Percival Wulfric Brian Dumbledore albusD@hogwarts.com HeadMaster            AlbusDumbledore
# 2 Harry James Potter                      harryP@hogwarts.com Student               HarryPotter    
# 3 Tom Marvollo Riddle                     tomR@hogwarts.com   Dark Lord in training TomRiddle      
# 4 Lord Voldemort                          LV@Wiz.com          Dark Lord             LordVoldemort

或者更简单地说

df %>%
  mutate(FirstLast = sub(" .* ", "", FirstName))
# # A tibble: 4 × 4
#   FirstName                               Email               ClassSection          FirstLast      
#   <chr>                                   <chr>               <chr>                 <chr>          
# 1 Albus Percival Wulfric Brian Dumbledore albusD@hogwarts.com HeadMaster            AlbusDumbledore
# 2 Harry James Potter                      harryP@hogwarts.com Student               HarryPotter    
# 3 Tom Marvollo Riddle                     tomR@hogwarts.com   Dark Lord in training TomRiddle      
# 4 Lord Voldemort                          LV@Wiz.com          Dark Lord             Lord Voldemort
ybzsozfc

ybzsozfc2#

我们可以使用extract

library(tidyr)
extract(df, FirstName, into = c("First", "Last"),
    "^(\\S+)\\s*.*\\s+(\\S+)$", remove = FALSE)
  • 输出
# A tibble: 4 × 5
  FirstName                               First Last       Email               ClassSection         
  <chr>                                   <chr> <chr>      <chr>               <chr>                
1 Albus Percival Wulfric Brian Dumbledore Albus Dumbledore albusD@hogwarts.com HeadMaster           
2 Harry James Potter                      Harry Potter     harryP@hogwarts.com Student              
3 Tom Marvollo Riddle                     Tom   Riddle     tomR@hogwarts.com   Dark Lord in training
4 Lord Voldemort                          Lord  Voldemort  LV@Wiz.com          Dark Lord

或者从list中提取

library(purrr)
library(dplyr)
df%>%
   mutate(First = str_split(FirstName, pattern = " "), .after = FirstName) %>% 
   mutate(First = map(First, ~ tibble(First = first(.x), 
       Last = last(.x)))) %>% 
   unnest_wider(First)
  • 输出
# A tibble: 4 × 5
  FirstName                               First Last       Email               ClassSection         
  <chr>                                   <chr> <chr>      <chr>               <chr>                
1 Albus Percival Wulfric Brian Dumbledore Albus Dumbledore albusD@hogwarts.com HeadMaster           
2 Harry James Potter                      Harry Potter     harryP@hogwarts.com Student              
3 Tom Marvollo Riddle                     Tom   Riddle     tomR@hogwarts.com   Dark Lord in training
4 Lord Voldemort                          Lord  Voldemort  LV@Wiz.com          Dark Lord

相关问题