R语言 将包括字符和excel序列号在内的混合日期时间格式转换为POSIX

k10s72fa  于 2023-03-27  发布在  其他
关注(0)|答案(1)|浏览(88)

我有数百个Excel文件,其中包含多个具有混合日期时间格式的列,包括小数。一些变量名称在文件之间共享,而一些则不是。我需要将所有这些数据集绑定在一起,并确保每个日期时间列(当前作为字符读入)都是POSIX。
样本数据和目标:

df1 <- tibble::tribble(
               ~v1, ~v2,              ~v3, ~v4,
  "3/8/2022 22:59", "A",    "44629.85139", "E",
    "3/9/22 19:24", "B",    "44629.88056", "F",
     "44629.81389", "C", "3/8/2022 20:25", "G",
     "44629.81389", "D", "3/8/2022 20:31", "H"
  )

df2 <- tibble::tribble(
               ~v1, ~v2,              ~v5, ~v6,
  "3/8/2022 22:59", "I",    "44629.85139", "M",
     "44629.81389", "J", "3/7/2022 21:38", "N",
     "44629.81389", "K", "3/7/2022 21:44", "O",
     "44629.81389", "L", "3/7/2022 21:48", "P"
  )

goal <- tibble::tribble(
                    ~v1, ~v2,                   ~v3, ~v4,                   ~v5, ~v6,
  "2022-03-08 22:59:00", "A", "2022-03-09 20:26:00", "E",                    NA,  NA,
  "2022-03-09 19:24:00", "B", "2022-03-09 21:08:00", "F",                    NA,  NA,
  "2022-03-09 19:32:00", "C", "2022-03-08 20:25:00", "G",                    NA,  NA,
  "2022-03-09 19:32:00", "D", "2022-03-08 20:31:00", "H",                    NA,  NA,
  "2022-03-08 22:59:00", "I",                    NA,  NA, "2022-03-09 20:26:00", "M",
  "2022-03-09 19:32:00", "J",                    NA,  NA, "2022-03-07 21:38:00", "N",
  "2022-03-09 19:32:00", "K",                    NA,  NA, "2022-03-07 21:44:00", "O",
  "2022-03-09 19:32:00", "L",                    NA,  NA, "2022-03-07 21:48:00", "P"
  )

解决方案here将十进制时间转换为各个列的正确格式,但当然不适用于其他格式:

> as.POSIXct("1900-01-01") + as.difftime(as.numeric(df1$v1), units = "days")
[1] NA                       
[2] NA                       
[3] "2022-03-11 19:32:00 EST"
[4] "2022-03-11 19:32:00 EST"
Warning message:
In as.difftime(as.numeric(df1$v1), units = "days") :  NAs introduced by coercion

当文件中的变量不同时,如何为每个数据集转换多个混合格式的datetime列?我尝试了lubridate::parse_date_time,但无法找出如何将十进制转换适合orders参数。非常感谢任何帮助!

bkhjykvo

bkhjykvo1#

library(dplyr)
library(lubridate)

bind_rows(df1, df2) %>% 
  rowwise() %>% 
  mutate(across(c("v1", "v3", "v5"), 
                ~pmax(mdy_hm(.x),
                      ymd_hms(.x),
                      as.POSIXct(as.numeric(.x) * (60*60*24), 
                                 origin="1899-12-30", tz="GMT"),
                      na.rm = TRUE)))

#> # A tibble: 8 x 6
#> # Rowwise: 
#>   v1                  v2    v3                  v4    v5                  v6   
#>   <dttm>              <chr> <dttm>              <chr> <dttm>              <chr>
#> 1 2022-03-08 22:59:00 A     2022-03-09 20:26:00 E     NA                  <NA> 
#> 2 2022-03-09 19:24:00 B     2022-03-09 21:08:00 F     NA                  <NA> 
#> 3 2022-03-09 19:32:00 C     2022-03-08 20:25:00 G     NA                  <NA> 
#> 4 2022-03-09 19:32:00 D     2022-03-08 20:31:00 H     NA                  <NA> 
#> 5 2022-03-08 22:59:00 I     NA                  <NA>  2022-03-09 20:26:00 M    
#> 6 2022-03-09 19:32:00 J     NA                  <NA>  2022-03-07 21:38:00 N    
#> 7 2022-03-09 19:32:00 K     NA                  <NA>  2022-03-07 21:44:00 O    
#> 8 2022-03-09 19:32:00 L     NA                  <NA>  2022-03-07 21:48:00 P

reprex package(v2.0.1)于2023-03-23创建

相关问题