在R rame中选择第一个非缺失日期的问题

cczfrluj 于 12个月前发布在其他

关注(0)|答案(2)|浏览(135)

我有一个有两个列的框架如下。

nct_id   recrstdt check start_date   sdt
 NCT02277743 2014-12-21     2 2014-10-31 16425
 NCT02277769 2014-12-21     2 2014-11-30 16425
 NCT03131648 2017-06-08     2 2017-05-30 17325
 NCT03160885 2017-07-14     2 2017-06-12 17361
 NCT03349060 2018-01-17     2 2017-12-07 17548
 NCT03569293 2018-08-21     2 2018-08-13 17764
 NCT03575871 2018-07-12     2 2018-06-29 17724
 NCT03607422 2018-08-09     2 2018-07-27 17752
 NCT04146363       <NA>     2 2019-09-24   Inf
 NCT04162769       <NA>     2 2019-10-04   Inf
 NCT04178967       <NA>     2 2019-10-29   Inf

structure(list(nct_id = c("NCT02277743", "NCT02277769", "NCT03131648", 
"NCT03160885", "NCT03349060", "NCT03569293", "NCT03575871", "NCT03607422", 
"NCT04146363", "NCT04162769", "NCT04178967"), recrstdt = structure(c(16425, 
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, Inf
), class = "Date"), check = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
), start_date = structure(c(16374, 16404, 17316, 17329, 17507, 
17756, 17711, 17739, 18163, 18173, 18198), class = "Date"), sdt = c(16425, 
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, Inf
)), row.names = c(NA, -11L), class = "data.frame")

字符串
我使用以下代码创建列sdt，该列按顺序从recstdt或start_date中选择第一个非missing。但是，当recstdt为NA时，start_date不会被选择。
Clostdt的类别为“日期”

mutate(check=ifelse(is.na(recrstdt), 1, 2),
        sdt=ifelse(!is.na(recrstdt), recrstdt, start_date))

型
任何帮助弄清楚这一点是非常感谢。

来源：https://stackoverflow.com/questions/77688611/issue-with-picking-first-of-non-missing-dates-in-r-dataframe

2条答案

按热度按时间

svdrlsy41#

对于字符串`<NA>`使用`na_if`或`if_else`：

library(dplyr)

df %>%
  mutate(recrstdt = na_if(recrstdt, "<NA>"),
         sdt = coalesce(recrstdt, start_date)) 

# OR

library(dplyr)
df %>% 
  mutate(sdt = if_else(recrstdt == "<NA>", start_date, recrstdt))

个字符
数据类型：

structure(list(nct_id = c("NCT02277743", "NCT02277769", "NCT03131648", 
"NCT03160885", "NCT03349060", "NCT03569293", "NCT03575871", "NCT03607422", 
"NCT04146363", "NCT04162769", "NCT04178967"), recrstdt = structure(c(16425, 
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, Inf
), class = "Date"), check = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
), start_date = structure(c(16374, 16404, 17316, 17329, 17507, 
17756, 17711, 17739, 18163, 18173, 18198), class = "Date"), sdt = c(16425, 
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, Inf
)), row.names = c(NA, -11L), class = "data.frame")

型

对于`Inf`使用`if_else`语句：

library(dplyr)
df %>% 
  mutate(sdt = if_else(recrstdt == Inf, start_date, recrstdt))

        nct_id   recrstdt check start_date        sdt
1  NCT02277743 2014-12-21     2 2014-10-31 2014-12-21
2  NCT02277769 2014-12-21     2 2014-11-30 2014-12-21
3  NCT03131648 2017-06-08     2 2017-05-30 2017-06-08
4  NCT03160885 2017-07-14     2 2017-06-12 2017-07-14
5  NCT03349060 2018-01-17     2 2017-12-07 2018-01-17
6  NCT03569293 2018-08-21     2 2018-08-13 2018-08-21
7  NCT03575871 2018-07-12     2 2018-06-29 2018-07-12
8  NCT03607422 2018-08-09     2 2018-07-27 2018-08-09
9  NCT04146363        Inf     2 2019-09-24 2019-09-24
10 NCT04162769        Inf     2 2019-10-04 2019-10-04
11 NCT04178967        Inf     2 2019-10-29 2019-10-29

型
数据类型：

structure(list(nct_id = c("NCT02277743", "NCT02277769", "NCT03131648", 
"NCT03160885", "NCT03349060", "NCT03569293", "NCT03575871", "NCT03607422", 
"NCT04146363", "NCT04162769", "NCT04178967"), recrstdt = structure(c(16425, 
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, Inf
), class = "Date"), check = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
), start_date = structure(c(16374, 16404, 17316, 17329, 17507, 
17756, 17711, 17739, 18163, 18173, 18198), class = "Date"), sdt = c(16425, 
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, Inf
)), row.names = c(NA, -11L), class = "data.frame")

型

不适用

我们可以用coalesce()来实现：

library(dplyr)

df %>%
  mutate(sdt = coalesce(recrstdt, start_date))

recrstdt check start_date        sdt
1  2014-12-21     2 2014-10-31 2014-12-21
2  2014-12-21     2 2014-11-30 2014-12-21
3  2017-06-08     2 2017-05-30 2017-06-08
4  2017-07-14     2 2017-06-12 2017-07-14
5  2018-01-17     2 2017-12-07 2018-01-17
6  2018-08-21     2 2018-08-13 2018-08-21
7  2018-07-12     2 2018-06-29 2018-07-12
8  2018-08-09     2 2018-07-27 2018-08-09
9        <NA>     2 2019-09-24 2019-09-24
10       <NA>     2 2019-10-04 2019-10-04
11       <NA>     2 2019-10-29 2019-10-29

数据类型：

structure(list(recrstdt = c("2014-12-21", "2014-12-21", "2017-06-08", 
"2017-07-14", "2018-01-17", "2018-08-21", "2018-07-12", "2018-08-09", 
NA, NA, NA), check = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), start_date = c("2014-10-31", "2014-11-30", "2017-05-30", 
"2017-06-12", "2017-12-07", "2018-08-13", "2018-06-29", "2018-07-27", 
"2019-09-24", "2019-10-04", "2019-10-29")), class = "data.frame", row.names = c(NA, 
-11L))

赞(0）回复(0）举报 12个月前

wydwbb8l2#

由于is.finite(NA) == FALSE，我们可以考虑Inf或NA。如果两者都是Inf或NA，则给出Inf。

> dat |>
+   transform(check=replace(check, !is.finite(recrstdt), 1),
+             sdt=as.Date(mapply(min, recrstdt, start_date, na.rm=TRUE)))
        nct_id   recrstdt check start_date        sdt
1  NCT02277743 2014-12-21     2 2014-10-31 2014-10-31
2  NCT02277769 2014-12-21     2 2014-11-30 2014-11-30
3  NCT03131648 2017-06-08     2 2017-05-30 2017-05-30
4  NCT03160885 2017-07-14     2 2017-06-12 2017-06-12
5  NCT03349060 2018-01-17     2 2017-12-07 2017-12-07
6  NCT03569293 2018-08-21     2 2018-08-13 2018-08-13
7  NCT03575871 2018-07-12     2 2018-06-29 2018-06-29
8  NCT03607422 2018-08-09     2 2018-07-27 2018-07-27
9  NCT04146363        Inf     1 2019-09-24 2019-09-24
10 NCT04162769        Inf     1        Inf        Inf
11 NCT04178967       <NA>     1 2019-10-29 2019-10-29

字符串

数据类型：*

> dput(dat)
structure(list(nct_id = c("NCT02277743", "NCT02277769", "NCT03131648", 
"NCT03160885", "NCT03349060", "NCT03569293", "NCT03575871", "NCT03607422", 
"NCT04146363", "NCT04162769", "NCT04178967"), recrstdt = structure(c(16425, 
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, NA
), class = "Date"), check = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
), start_date = structure(c(16374, 16404, 17316, 17329, 17507, 
17756, 17711, 17739, 18163, Inf, NA), class = "Date"), sdt = c(16425, 
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, Inf
)), row.names = c(NA, -11L), class = "data.frame")

型

赞(0）回复(0）举报 12个月前

我来回答

在R rame中选择第一个非缺失日期的问题

2条答案

对于字符串`<NA>`使用`na_if`或`if_else`：

对于`Inf`使用`if_else`语句：

不适用

相关问题

热门标签

最新问答

在R rame中选择第一个非缺失日期的问题

2条答案

对于字符串<NA>使用na_if或if_else：

对于Inf使用if_else语句：

不适用

相关问题

热门标签

最新问答

对于字符串`<NA>`使用`na_if`或`if_else`：

对于`Inf`使用`if_else`语句：