合并R中有NA值的特定行

s4chpxco  于 2023-01-15  发布在  其他
关注(0)|答案(2)|浏览(160)

我有向下合并的数据,但是在每个小时标记上,我的数据中有一行具有高程值,还有一行具有NA值,我希望能够将这些行合并在一起,或者换句话说,去掉具有NA值的行,表中所有其他值都是NA值也没关系。
以下是我的数据:

> dput(stackoverflow[1:100,])
structure(list(Time = structure(c(1432425600, 1432426500, 1432427400, 
1432428300, 1432429200, 1432430100, 1432431000, 1432431900, 1432432800, 
1432433700, 1432434600, 1432435500, 1432436400, 1432437300, 1432438200, 
1432439100, 1432440000, 1432440900, 1432441800, 1432442700, 1432443600, 
1432444500, 1432445400, 1432446300, 1432447200, 1432448100, 1432449000, 
1432449900, 1432450800, 1432450800, 1432451700, 1432452600, 1432453500, 
1432454400, 1432454400, 1432455300, 1432456200, 1432457100, 1432458000, 
1432458000, 1432458900, 1432459800, 1432460700, 1432461600, 1432461600, 
1432462500, 1432463400, 1432464300, 1432465200, 1432465200, 1432466100, 
1432467000, 1432467900, 1432468800, 1432468800, 1432469700, 1432470600, 
1432471500, 1432472400, 1432472400, 1432473300, 1432474200, 1432475100, 
1432476000, 1432476000, 1432476900, 1432477800, 1432478700, 1432479600, 
1432479600, 1432480500, 1432481400, 1432482300, 1432483200, 1432483200, 
1432484100, 1432485000, 1432485900, 1432486800, 1432486800, 1432487700, 
1432488600, 1432489500, 1432490400, 1432490400, 1432491300, 1432492200, 
1432493100, 1432494000, 1432494000, 1432494900, 1432495800, 1432496700, 
1432497600, 1432498500, 1432499400, 1432500300, 1432501200, 1432502100, 
1432503000), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
    Turtle = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), levels = c("R3L1", "R3L11", "R3L12", "R3L2", "R3L4", 
    "R3L8", "R3L9", "R4L8", "R8L1", "R8L4", "R8NAT123"), class = "factor"), 
    elevation = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, 282.27, NA, NA, NA, NA, 283.21, NA, NA, NA, NA, 282.14, 
    NA, NA, NA, NA, 281.63, NA, NA, NA, NA, 281.63, NA, NA, NA, 
    NA, 281.63, NA, NA, NA, NA, 282.63, NA, NA, NA, NA, 281.63, 
    NA, NA, NA, NA, 282.14, NA, NA, NA, NA, 281.63, NA, NA, NA, 
    NA, 282.14, NA, NA, NA, NA, 281.36, NA, NA, NA, NA, 282.14, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))


这里还有一个我想合并的行的例子。
如果你有任何问题请告诉我。

fnx2tebb

fnx2tebb1#

由于 Dataframe (这里,df)具有相同的TimeTurtle值,并且NA值是唯一重复的值并且总是在Elevation值之后,因此dplyr方法将是:

library(dplyr)

df %>% 
  group_by(Time, Turtle) %>% 
  slice(1)

# or

df %>% # thanks @neilfws
  group_by(Time, Turtle) %>% 
  slice_head()

它将TimeTurtle值分组并取第一个值。

nsc4cvqm

nsc4cvqm2#

带有dplyr

library(dplyr)
df1 %>% 
  group_by(Time, Turtle) %>%
  summarise(elevation = elevation[!is.na(elevation)][1], .groups = 'drop')
  • 输出
# A tibble: 87 × 3
# Groups:   Time [87]
   Time                Turtle elevation
   <dttm>              <fct>      <dbl>
 1 2015-05-24 00:00:00 R3L1          NA
 2 2015-05-24 00:15:00 R3L1          NA
 3 2015-05-24 00:30:00 R3L1          NA
 4 2015-05-24 00:45:00 R3L1          NA
 5 2015-05-24 01:00:00 R3L1          NA
 6 2015-05-24 01:15:00 R3L1          NA
 7 2015-05-24 01:30:00 R3L1          NA
 8 2015-05-24 01:45:00 R3L1          NA
 9 2015-05-24 02:00:00 R3L1          NA
10 2015-05-24 02:15:00 R3L1          NA

或者使用distinct

df1 %>% 
   arrange(Time, Turtle, is.na(elevation)) %>%
   distinct()

相关问题