R语言 如何创建一个只从mm/dd/yy hh:mm字符串中提取年份或月份的新列?

yftpprvb  于 2022-12-06  发布在  其他
关注(0)|答案(4)|浏览(164)

我有一个日期/时间字符串变量,如下所示:

> dput(df$starttime)
c("12/16/20 7:24", "6/21/21 13:20", "1/22/20 9:03", "1/07/20 17:19", 
"11/8/21 10:14", NA, NA, "10/26/21 7:19", "3/14/22 9:48", "5/12/22 13:29"

我基本上想创建一个只有年份(2020、2021、2022)和年份+月份(例如,“Jan 2022”)的列

ckocjqey

ckocjqey1#

1)Base R假设您需要单独的月份和年份数字列,请定义一个函数,将问题中所示格式的字符串转换为年份或月份数字,然后调用该函数两次。

toNum <- function(x, fmt) format(as.Date(x, "%m/%d/%y"), fmt) |>
  type.convert(as.is = TRUE)
transform(df, year = toNum(starttime, "%Y"), month = toNum(starttime, "%m"))

给予

starttime year month
1  12/16/20 7:24 2020    12
2  6/21/21 13:20 2021     6
3   1/22/20 9:03 2020     1
4  1/07/20 17:19 2020     1
5  11/8/21 10:14 2021    11
6           <NA>   NA    NA
7           <NA>   NA    NA
8  10/26/21 7:19 2021    10
9   3/14/22 9:48 2022     3
10 5/12/22 13:29 2022     5

2)yearmon假设您需要一个yearmon类列,该列在内部将年和月表示为year + fraction,其中fraction为0表示Ja,1/12表示Feb,...,11/12表示Dec,这样它就可以适当地排序,并加上1/12,例如:我们可以使用下面的公式给予下一个月。注意,如果ym是yearmon,那么as.integer(ym)是年份,cycle(ym)是月份数(1,2,...,12)。

library(zoo)
transform(df, yearmon = as.yearmon(starttime, "%m/%d/%y"))

给出:

starttime  yearmon
1  12/16/20 7:24 Dec 2020
2  6/21/21 13:20 Jun 2021
3   1/22/20 9:03 Jan 2020
4  1/07/20 17:19 Jan 2020
5  11/8/21 10:14 Nov 2021
6           <NA>     <NA>
7           <NA>     <NA>
8  10/26/21 7:19 Oct 2021
9   3/14/22 9:48 Mar 2022
10 5/12/22 13:29 May 2022

注意
如果要按starttime排序,请使用

ct <- as.POSIXct(df$starttime, format = "%m/%d/%Y %H:%M")
df[order(ct),, drop = FALSE ]
2g32fytz

2g32fytz2#

如果希望输出按时间顺序排序,可以使用tsibble::yearmonth类型:

tsibble::yearmonth(lubridate::mdy_hm(c("12/16/20 7:24", "6/21/21 13:20", "1/22/20 9:03", "1/07/20 17:19", 
  "11/8/21 10:14", NA, NA, "10/26/21 7:19", "3/14/22 9:48", "5/12/22 13:29")))

结果

<yearmonth[10]>
 [1] "2020 Dec" "2021 Jun" "2020 Jan" "2020 Jan" "2021 Nov" NA         NA        
 [8] "2021 Oct" "2022 Mar" "2022 May"
whlutmcx

whlutmcx3#

一个选项是使用mdy_hm(从lubridate)转换为日期时间类POSIXct,然后使用format提取月份(%b)和4位数年份(%Y),使用filter提取NA元素,并基于转换后的日期时间列提取arrange

library(dplyr)
library(lubridate)
df %>% 
   mutate(starttime = mdy_hm(starttime),
         yearmonth = format(starttime, "%b %Y")) %>%
   filter(complete.cases(yearmonth)) %>%
   arrange(starttime)
  • 输出
# A tibble: 8 × 2
  starttime           yearmonth
  <dttm>              <chr>    
1 2020-01-07 17:19:00 Jan 2020 
2 2020-01-22 09:03:00 Jan 2020 
3 2020-12-16 07:24:00 Dec 2020 
4 2021-06-21 13:20:00 Jun 2021 
5 2021-10-26 07:19:00 Oct 2021 
6 2021-11-08 10:14:00 Nov 2021 
7 2022-03-14 09:48:00 Mar 2022 
8 2022-05-12 13:29:00 May 2022
dldeef67

dldeef674#

使用lubridate尝试此操作

library(lubridate)

data.frame(df, 
  Year = format(mdy_hm(df$starttime), "%Y"), 
  MonthYear = format(mdy_hm(df$starttime), "%b %Y"))
       starttime Year MonthYear
1  12/16/20 7:24 2020  Dec 2020
2  6/21/21 13:20 2021  Jun 2021
3   1/22/20 9:03 2020  Jan 2020
4  1/07/20 17:19 2020  Jan 2020
5  11/8/21 10:14 2021  Nov 2021
6           <NA> <NA>      <NA>
7           <NA> <NA>      <NA>
8  10/26/21 7:19 2021  Oct 2021
9   3/14/22 9:48 2022  Mar 2022
10 5/12/22 13:29 2022  May 2022

它将mdy_hmformat结合使用,以获得所需的Year %Y%b %Y,并将日期的月份和年份部分缩写。
已排序的数据列:

df_new <- data.frame(df, 
  Year = format(mdy_hm(df$starttime), "%Y"), 
  MonthYear = format(mdy_hm(df$starttime), "%b %Y"))

df_new[order(my(df_new$MonthYear)),]
       starttime Year MonthYear
3   1/22/20 9:03 2020  Jan 2020
4  1/07/20 17:19 2020  Jan 2020
1  12/16/20 7:24 2020  Dec 2020
2  6/21/21 13:20 2021  Jun 2021
8  10/26/21 7:19 2021  Oct 2021
5  11/8/21 10:14 2021  Nov 2021
9   3/14/22 9:48 2022  Mar 2022
10 5/12/22 13:29 2022  May 2022
6           <NA> <NA>      <NA>
7           <NA> <NA>      <NA>

NA s

na.omit(df_new[order(my(df_new$MonthYear)),])
       starttime Year MonthYear
3   1/22/20 9:03 2020  Jan 2020
4  1/07/20 17:19 2020  Jan 2020
1  12/16/20 7:24 2020  Dec 2020
2  6/21/21 13:20 2021  Jun 2021
8  10/26/21 7:19 2021  Oct 2021
5  11/8/21 10:14 2021  Nov 2021
9   3/14/22 9:48 2022  Mar 2022
10 5/12/22 13:29 2022  May 2022

数据

df <- structure(list(starttime = c("12/16/20 7:24", "6/21/21 13:20",
"1/22/20 9:03", "1/07/20 17:19", "11/8/21 10:14", NA, NA, "10/26/21 7:19",
"3/14/22 9:48", "5/12/22 13:29")), class = "data.frame", row.names = c(NA,
-10L))

相关问题