case_when和regex条件的意外警告表明匹配的case太多

dsekswqp 于 2023-06-19 发布在其他

关注(0)|答案(1)|浏览(76)

我有一个数据集，日期/时间格式混乱，在一些hh：mm格式和Excel序列号。因此，我将所有内容强制转换为字符串，并在一个大的case_when块中使用stringr和readr来识别不同的格式并正确地处理它们。我想我误解了我的stringr函数或case_when，因为我得到了我所期望的输出，但它抛出了解析失败和NA强制的警告，而这些都不在最终产品中。
下面是一些虚拟数据，其中包括我的数据集中的每种格式的示例：

dummy <- tibble(x = c("13:15:21", "02:03:17+01:00", "12:03", "0.1234"))

我已经创建了一个函数来识别和解析这些格式。它调用另一个函数将excel序列代码转换为时间。它使用了我认为正确的正则表达式，但总结一下：

^0\\.应该通过Excel序列号都是<1的小数来识别它们
\\+通过搜索加号来标识DST时区指示符的时间，然后
.+(?=\\+)正在提取要解析的加号之前的所有内容
:正在测试结肠，以确保结果是某种时间。这是一个范围更广的测试，所以它是在已经匹配好的优势之后的最后一个测试

convert_times <- function(x){
  case_when(str_detect(x, "^0\\.")        ~ convert_excel_time(x), 
            str_detect(x, "\\+")          ~ parse_time(str_extract(x, ".+(?=\\+)")), 
            str_detect(x, ":")            ~ parse_time(x),
            .default = NA)
}

convert_excel_time <- function(x){
  as.numeric(x) * 24 * 60 * 60 %>%
  as_datetime() %>% 
  hms::as_hms()
}

当我运行它时，我得到了预期的输出，但沿着而来的警告告诉我，我不明白引擎盖下发生了什么。

> dummy %>% 
+   mutate(new = convert_time(x))
# A tibble: 4 × 2
  x              new        
  <chr>          <time>     
1 13:15:21       13:15:21.00
2 02:03:17+01:00 02:03:17.00
3 12:03          12:03:00.00
4 0.1234         02:57:41.76

这些都是我的错误

[[1]]
<warning/rlang_warning>
Warning in `mutate()`:
ℹ In argument: `new = convert_time(x)`.
Caused by warning in `convert_excel_time()`:
! NAs introduced by coercion
---
Backtrace:
    ▆
 1. ├─dummy %>% mutate(new = convert_time(x))
 2. ├─dplyr::mutate(., new = convert_time(x))
 3. └─dplyr:::mutate.data.frame(., new = convert_time(x))

[[2]]
<warning/rlang_warning>
Warning in `mutate()`:
ℹ In argument: `new = convert_time(x)`.
Caused by warning:
! 2 parsing failures.
row col   expected         actual
  2  -- time like  02:03:17+01:00
  4  -- time like  0.1234        
---
Backtrace:
    ▆
 1. ├─dummy %>% mutate(new = convert_time(x))
 2. ├─dplyr::mutate(., new = convert_time(x))
 3. └─dplyr:::mutate.data.frame(., new = convert_time(x))

在我看来，convert_time根本不应该试图解析这两个观察结果，因为它们被排除在case_when块的左侧。类似地，我也不期望NA强制，因为case_when的左手边阻止convert_excel_time()看到hh：mm字符串。非常感谢。

r

来源：https://stackoverflow.com/questions/76452811/unexpected-warnings-with-case-when-and-regex-conditions-suggest-too-many-cases-a

1条答案

按热度按时间

ruarlubt1#

啊！我没看完所有的文件。dplyr参考（https://dplyr.tidyverse.org/reference/case_when.html）清楚地说明case_when总是求解所有RHS方程，这就是为什么他们抛出警告，但只使用符合LHS条件的方程。

# `case_when()` evaluates all RHS expressions, and then constructs its
# result by extracting the selected (via the LHS expressions) parts.
# In particular `NaN`s are produced in this case:
y <- seq(-2, 2, by = .5)
case_when(
  y >= 0 ~ sqrt(y),
  .default = y
)
#> Warning: NaNs produced
#> [1] -2.0000000 -1.5000000 -1.0000000 -0.5000000  0.0000000  0.7071068
#> [7]  1.0000000  1.2247449  1.4142136

赞(0）回复(0）举报 2023-06-19

我来回答

case_when和regex条件的意外警告表明匹配的case太多

1条答案

相关问题

热门标签

最新问答