在R中，如何转换包含许多不同日期格式(例如德语，英语，每小时)的chr列？

beq87vna 于 2023-05-20 发布在其他

关注(0)|答案(2)|浏览(112)

我有一个包含多个列的dataframe df。一列Dates包含许多不同日期格式的字符串值。它包含德语日期格式（即6. 2022年6月或5. 2019年8月）、英语日期格式（即2019年4月5日或2019年3月4日），它包含两种格式的日期，而不需要添加当前年份（即1月2日或15日此外，它还包含两种语言的小时数（即2标准品或3小时）。
我首先想将日期转换为一种格式（最好是英语标准yyyy-mm-dd），然后添加一个新列Time_since，用于计算从Dates到现在的小时数。例如，如果我有日期16。2023年5月（所以昨天在德国格式）我想在新列24（因为我没有具体的时间，我假设全天2023-05-17）。如果我有10小时或10标准。那我只想要10分
这是我的dataframe的dput()：

structure(list(Dates = c("1h", "10 Std.", "Apr 5", "Dec 8, 2022", 
"May 30, 2019", "6. Juni 2021", NA, "15. März", "13. Aug. 2019"
)), class = "data.frame", row.names = c(NA, -9L))

输出应该是这样的：

Dates        Time_since
1h           1
10 Std.      10
2023-04-05   1008
2022-12-08   3840
2019-05-30   25800
2021-06-06   17040
NA           NA 
2023-03-15   1512  
2019-08-13   24192

你知道我怎么能一次把这么多的变形结合起来吗？

来源：https://stackoverflow.com/questions/76273690/in-r-how-can-i-convert-a-chr-column-that-contains-many-different-dates-in-diffe

2条答案

按热度按时间

yqhsw0fo1#

你应该尽可能地把单词（月，小时）从德语翻译成英语。要将日期转换为标准化格式并计算从日期到现在的小时数，可以使用R中的lubridate包。如果日期具有特定的时区，则可能需要相应地调整代码。

library(lubridate)

df <- structure(list(Dates = c("1h", "10 Std.", "Apr 5", "Dec 8, 2022",
                              "May 30, 2019", "6. Juni 2021", NA, "15. März", "13. Aug. 2019")),
                class = "data.frame", row.names = c(NA, -9L))

parse_date <- function(x) {
  x <- gsub("Juni", "June", x, fixed = TRUE)
  x <- gsub("März", "March", x, fixed = TRUE)
  x <- parse_date_time(x, orders = c("d. b", "d. B", "b d, y", "B d, y"))
  return(x)
}

df$Dates <- parse_date(df$Dates)
df$Time_since <- ifelse(!is.na(df$Dates), round(as.numeric(difftime(Sys.time(), df$Dates, units = "hours"), origin = "1970-01-01")), NA)

print(df)

       Dates Time_since
1       <NA>         NA
2       <NA>         NA
3       <NA>         NA
4 2022-12-08       3856
5 2019-05-30      34768
6       <NA>         NA
7       <NA>         NA
8 0000-03-15   17734768
9       <NA>         NA

我无法完美地转换所有日期，但我认为这至少可以帮助你朝着正确的方向发展。请记住相应地更改'origin'参数。祝你好运！

赞(0）回复(0）举报 2023-05-20

vuktfyat2#

我自己得到了答案，但也许有人可以以某种方式缩短这个代码。我只是添加了许多不同的步骤和列来得到答案

#Convert h and Std.
df$Tw_hours <- as.numeric(gsub(" Std\\.", "", df$Tw_Date_Tweet_1))
df$Tw_hours <- as.numeric(gsub(" Std\\.|h", "", df$Tw_Date_Tweet_1))

# Load the lubridate package for date manipulation functions
library(lubridate)
# Convert rows with the valid date format to dates (English and German)
df$Tw_Date_Tweet_1_New_En <- parse_date_time(df$Tw_Date_Tweet_1, orders = c("dby", "mdy"))

# Specify the pre-specified date
specified_date <- as.POSIXct("2023-04-11")
# Calculate the hours between specified_date and Tw_Date_Tweet_1
df$Tw_Hours_Difference <- as.numeric(difftime(specified_date, df$Tw_Date_Tweet_1_New_En, units = "hours"))
#Add the pre-calculated values from h and Std.
df$Tw_Hours_Difference <- ifelse(is.na(df$Tw_Hours_Difference),df$Tw_hours, df$Tw_Hours_Difference)
#Get all remaining values without a year in a separate column
df$Tw_Remaining <- ifelse(is.na(df$Tw_Hours_Difference),df$Tw_Date_Tweet_1, NA)

# Add the year 2023 to non-NA values in Tw_Remaining column
df$Tw_Remaining <- ifelse(!is.na(df$Tw_Remaining),
                          paste(df$Tw_Remaining, "2023", sep = " "),
                          df$Tw_Remaining)
# Convert rows with the valid English date format to dates
df$Tw_Remaining_New <- parse_date_time(df$Tw_Remaining, orders = c("dby", "mdy"))
# Calculate the hours between specified_date and Tw_Date_Tweet_1
df$Tw_Remaining_New_Hours <- as.numeric(difftime(specified_date, df$Tw_Remaining_New, units = "hours"))
#Transfer dates into Tw_Date_Tweet_1_New_En
df$Tw_Hours_Difference <- ifelse(is.na(df$Tw_Hours_Difference), df$Tw_Remaining_New_Hours, df$Tw_Hours_Difference)

#Remove unnecessary columns
df <- select(df, -c(Tw_hours, Tw_Date_Tweet_1_New_En, Tw_Remaining, Tw_Remaining_New, Tw_Remaining_New_Hours))

print(df)

赞(0）回复(0）举报 2023-05-20

我来回答

在R中，如何转换包含许多不同日期格式(例如德语，英语，每小时)的chr列？

2条答案

相关问题

热门标签

最新问答