从R中的复杂列创建多个列

jpfvwuh4  于 2023-01-18  发布在  其他
关注(0)|答案(2)|浏览(115)

图像数据集:

df1 <- tibble::tribble(~City,   ~Population,
"United Kingdom > Leeds",   1500000,
"Spain > Las Palmas de Gran Canaria",   200000,
"Canada > Nanaimo, BC", 150000,
"Canada > Montreal",    250000,
"United States > Minneapolis, MN",  700000,
"United States > Milwaukee, WI",    NA,
"United States > Milwaukee",    400000)

用于视觉表示的相同数据集:

我想:
1.将City列拆分为三列:城市、国家、州(如适用,否则不适用)
1.检查密尔沃基是否具有州和人口数据(密尔沃基的NA值应为400000,然后拆分[城市-州-国家]:)。
你能,请,建议最简单的方法来做到这一点:)

iyzzxitl

iyzzxitl1#

下面是extract的另一个解决方案,它一次性提取CountryCityStateState由可选的捕获组提取(任务的其余部分由@艾伦代码完成):

library(tidyr)
library(dplyr)
df1 %>%
  extract(City,
          into = c("Country", "City", "State"),
          regex = "([^>]+) > ([^,]+),? ?([A-Z]+)?"
        ) %>%
  # as by @Allen Cameron:
  group_by(Country, City) %>%
  summarize(State = ifelse(all(is.na(State)), NA, State[!is.na(State)]), 
            Population = Population[!is.na(Population)])
x7rlezfr

x7rlezfr2#

您可以使用separate两次来获取国家和州,然后使用group_by Country and City来汇总NA值(如果适用):

library(tidyverse)

df1 %>%
  separate(City, sep = " > ", into = c("Country", "City")) %>%
  separate(City, sep = ', ', into = c('City', 'State')) %>%
  group_by(Country, City) %>%
  summarize(State = ifelse(all(is.na(State)), NA, State[!is.na(State)]), 
            Population = Population[!is.na(Population)])
#> # A tibble: 6 x 4
#> # Groups:   Country [4]
#>   Country        City                       State Population
#>   <chr>          <chr>                      <chr>      <dbl>
#> 1 Canada         Montreal                   <NA>      250000
#> 2 Canada         Nanaimo                    BC        150000
#> 3 Spain          Las Palmas de Gran Canaria <NA>      200000
#> 4 United Kingdom Leeds                      <NA>     1500000
#> 5 United States  Milwaukee                  WI        400000
#> 6 United States  Minneapolis                MN        700000

相关问题