R语言 将列中的所有数值替换为其对应的状态名称

bogh5gae  于 2023-03-20  发布在  其他
关注(0)|答案(2)|浏览(131)

我有一个列STATE CODE,如下所示:

您只能看到01,但它一直延伸到...53、54、55、56和72。有关从01开始的所有数字,请参阅下面的数据字典。
数据集按照州代码(01、02、03...... 56、72)进行排序。在数据字典中,这些数字对应州名。

在数据字典中,你可以看到数字代表州,但我希望实际的数据集包含这些州的名称,而不是数字。
我希望根据数据字典图像将这些数字替换为它们所表示的相应州名。
我可以使用mutate将每个数字替换为相应的州名,但我只是想知道是否有更简单、更圆滑的方法来做到这一点。
我也想知道各州按字母顺序排列的事实是否可以利用。
确切地说,现在的情况是这样的:

ST
01
01
01
02
02
04

这就是我想要的:

ST
Alabama     
Alabama 
Alabama                         
Alaska         
Alaska                      
Arizona

下面是数据字典的文本副本:

ST               
State Code
           01 .Alabama/AL                              
           02 .Alaska/AK                               
           04 .Arizona/AZ                              
           05 .Arkansas/AR                             
           06 .California/CA                           
           08 .Colorado/CO                             
           09 .Connecticut/CT                          
           10 .Delaware/DE                             
           11 .District of Columbia/DC                 
           12 .Florida/FL                              
           13 .Georgia/GA                              
           15 .Hawaii/HI                               
           16 .Idaho/ID                                
           17 .Illinois/IL                             
           18 .Indiana/IN                              
           19 .Iowa/IA                                 
           20 .Kansas/KS                               
           21 .Kentucky/KY                             
           22 .Louisiana/LA                            
           23 .Maine/ME                                
           24 .Maryland/MD                             
           25 .Massachusetts/MA                        
           26 .Michigan/MI                             
           27 .Minnesota/MN                            
           28 .Mississippi/MS                          
           29 .Missouri/MO                             
           30 .Montana/MT                              
           31 .Nebraska/NE                             
           32 .Nevada/NV                               
           33 .New Hampshire/NH                        
           34 .New Jersey/NJ                           
           35 .New Mexico/NM                           
           36 .New York/NY                             
           37 .North Carolina/NC                       
           38 .North Dakota/ND
           39 .Ohio/OH                                 
           40 .Oklahoma/OK                             
           41 .Oregon/OR                               
           42 .Pennsylvania/PA                         
           44 .Rhode Island/RI                         
           45 .South Carolina/SC                       
           46 .South Dakota/SD                         
           47 .Tennessee/TN                            
           48 .Texas/TX                                
           49 .Utah/UT                                 
           50 .Vermont/VT                              
           51 .Virginia/VA                             
           53 .Washington/WA                           
           54 .West Virginia/WV                        
           55 .Wisconsin/WI                            
           56 .Wyoming/WY                              
           72 .Puerto Rico/PR

我试过做变异,它的工作,但我不想做变异的每一个单一的代码。
编辑:
将包含州列表的文本文件导入R,并使用@akrun的代码将其转换为数据集。

正如你所看到的,它已经把亚拉巴马州作为一个标题。
下面是我使用的文本文件:

请告知我可以对
1.修正标题,使亚拉巴马向下移动一个单元格,如下所示:

State Code   State Name
1            Alabama
2            Alaska

我还想知道如何删除列值中“/”后面的每个单词。
例如,在下面的屏幕截图中:

我宁愿它只是阿拉斯加,而不是阿拉斯加/AK。

编辑2:

我组合了一堆代码,在尝试了所有这些代码后,这是最适合我的问题的代码:

df <- structure(list(`state code` = c("01", "02", "04", "05", "06", 
                                  "08", "09", "10", "11", "12", "13", "15", "16", "17", "18", "19", 
                                  "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", 
                                  "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", 
                                  "42", "44", "45", "46", "47", "48", "49", "50", "51", "53", "54", 
                                  "55", "56", "72"), state = c("Alabama/AL", "Alaska/AK", "Arizona/AZ", 
                                                               "Arkansas/AR", "California/CA", "Colorado/CO", "Connecticut/CT", 
                                                               "Delaware/DE", "District of Columbia/DC", "Florida/FL", "Georgia/GA", 
                                                               "Hawaii/HI", "Idaho/ID", "Illinois/IL", "Indiana/IN", "Iowa/IA", 
                                                               "Kansas/KS", "Kentucky/KY", "Louisiana/LA", "Maine/ME", "Maryland/MD", 
                                                               "Massachusetts/MA", "Michigan/MI", "Minnesota/MN", "Mississippi/MS", 
                                                               "Missouri/MO", "Montana/MT", "Nebraska/NE", "Nevada/NV", 
                                                               "New Hampshire/NH", "New Jersey/NJ", "New Mexico/NM", "New York/NY", 
                                                               "North Carolina/NC", "North Dakota/ND", "Ohio/OH", "Oklahoma/OK", 
                                                               "Oregon/OR", "Pennsylvania/PA", "Rhode Island/RI", "South Carolina/SC", 
                                                               "South Dakota/SD", "Tennessee/TN", "Texas/TX", "Utah/UT", 
                                                               "Vermont/VT", "Virginia/VA", "Washington/WA", "West Virginia/WV", 
                                                               "Wisconsin/WI", "Wyoming/WY", "Puerto Rico/PR")), row.names = c(NA, 
                                                                                                                                  -52L), class = "data.frame")

st <- structure(list(ST = c("01", "02", "04", "05", "06","08", "09", "10", "11", "12", "13", "15", "16", "17", "18", "19", 
                            "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", 
                            "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", 
                            "42", "44", "45", "46", "47", "48", "49", "50", "51", "53", "54", 
                            "55", "56", "72")), 
                row.names = c(NA,-52L), class = "data.frame")

    df2 <- data.frame(ST = gsub("/.*|\\W", "", lapply(st$ST, function(x) 
      df[x == df$'state code',2])))
    
    st$df2 <- df2
    
    st
    
    names(st) <- c("State Code", "State Names")

运行该代码会得到以下结果:
Result of above code
但我无法更改列名以从State Names后面删除$ST。
我的代码names(st) <- c("State Code", "State Names")应该更改名称以使其正常- State Names -但由于某种原因它不起作用。
任何关于这方面的建议都将是有益的。

ki1q1bka

ki1q1bka1#

您可以使用recode来获得所需的输出,只需对数据字典进行一些清理。

library(tidyverse)

df1<-tibble(ST=c("01","01","02","04","04","04","05"))
df2<-read.csv(text=c("State Code
01. Alabama/AL
02. Alaska/AK
04. Arizona/AZ
05. Arkansas/AR"),sep="",colClasses = c("character", "character"))

df2 <- df2 |>
  # remove extra point
  mutate(across(State, ~str_replace(.x, "\\.", ""))) |>
  # Extract only the name of the state
  mutate(across(Code, ~str_extract(.x, "^[A-z]+(?=\\/)"))) 

df1 |>
  mutate(ST_new = recode(df1$ST, !!!deframe(df2)))

# A tibble: 7 × 2
#  ST    ST_new  
#  <chr> <chr>   
#1 01    Alabama 
#2 01    Alabama 
#3 02    Alaska  
#4 04    Arizona 
#5 04    Arizona 
#6 04    Arizona 
#7 05    Arkansas
r3i60tvu

r3i60tvu2#

我们可以用readLines读取字典,用,替换第一个空格,然后用read.csv读取以创建data.frame,与第一个数据集进行连接

library(stringr)
library(dplyr)
str_replace(trimws(lines[-1]), " ", ",") %>% 
  read.csv(text = ., header = TRUE, colClasses = "character") %>% 
  left_join(df1, ., by = c("ST" = "State")) %>%
  transmute(ST = str_replace(Code, "^\\.?([^/]+)/.*", "\\1"))
  • 输出
ST
1 Alabama
2 Alabama
3 Alabama
4  Alaska
5  Alaska
6 Arizona

数据

df1 <- structure(list(ST = c("01", "01", "01", "02", "02", "04")), row.names = c(NA, 
-6L), class = "data.frame")

lines <- readLines("file.txt")

相关问题