R语言 计算给定因子中每个水平有多少个值?

uqjltbpv  于 2023-03-05  发布在  其他
关注(0)|答案(1)|浏览(175)

对于每一年,我想创建两个新列temp_countrh_count,分别计算每个列temp_catoghumidity_catog中出现的次数。如果您按一个变量分组,How to count how many values per level in a given factor?会回答这个问题,但我想使用group_by(year, humidity_catog, temp_catog)

我可以使用以下代码创建一个列humidity_count来计算每个类别humidity_catog列中出现的次数。

df <- group_by(year, humidity_catog) %>%
  summarize(humidity_count = n())

以下是输出

但是我想在同一个数据框中创建另一个列temp_count来统计每个类别temp_count列的数量,我该如何实现呢?下面是我通过dput函数创建的数据的可重现示例。

df <- structure(
  list(
    year = structure(
      c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
        1L, 1L, 1L),
      .Label = c(
        "2006",
        "2007",
        "2012",
        "2013",
        "2014",
        "2014_c",
        "2015_a",
        "2015_b",
        "2016",
        "2017",
        "2020"
      ),
      class = "factor"
    ),
    min_rh = c(47.9, 49, 44.7, 40.2, 50, 52.3, 51.5, 82.8, 73.8,
               47.1),
    min_temp = c(12.4, 14.3, 15.1, 16.1, 12.7, 16.1, 14.4,
                 15.1, 11.8, 9.5),
    temp_catog = structure(
      c(2L, 2L, 3L, 3L,
        2L, 3L, 2L, 3L, 2L, 2L),
      .Label = c("T1(<=8)", "T2(>8, <=15)",
                 "T3(>15)"),
      class = "factor"
    ),
    humidity_catog = structure(
      c(1L,
        1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L),
      .Label = c("RH1(<=65)",
                 "RH2(>65)"),
      class = "factor"
    )
  ),
  class = c("grouped_df",
            "tbl_df", "tbl", "data.frame"),
  row.names = c(NA,-10L),
  groups = structure(
    list(
      year = structure(
        1L,
        .Label = c(
          "2006",
          "2007",
          "2012",
          "2013",
          "2014",
          "2014_c",
          "2015_a",
          "2015_b",
          "2016",
          "2017",
          "2020"
        ),
        class = "factor"
      ),
      .rows = structure(
        list(1:10),
        ptype = integer(0),
        class = c("vctrs_list_of",
                  "vctrs_vctr", "list")
      )
    ),
    class = c("tbl_df", "tbl", "data.frame"),
    row.names = c(NA,-1L),
    .drop = TRUE
  )
)

注意:我不需要唯一的匹配项。我只需要计算每个类别被记录的次数。

pkwftd7m

pkwftd7m1#

不确定OP是如何合并两个汇总结果的,但是我们可以依次调用mutate而不是summarise,将分组变量提供给.by参数。
这个玩具的数据框是按年分组的,我事先把它取消了分组

library(dplyr) #requires dplyr 1.1.0 for the .by solution

df %>%
    ungroup() %>%
    mutate(rh_count = n(), .by = c(year, humidity_catog)) %>%
    mutate(temp_count = n(), .by = c(year, temp_catog))

# A tibble: 10 × 7
   year  min_rh min_temp temp_catog   humidity_catog rh_count temp_count
   <fct>  <dbl>    <dbl> <fct>        <fct>             <int>      <int>
 1 2006    47.9     12.4 T2(>8, <=15) RH1(<=65)             8          6
 2 2006    49       14.3 T2(>8, <=15) RH1(<=65)             8          6
 3 2006    44.7     15.1 T3(>15)      RH1(<=65)             8          4
 4 2006    40.2     16.1 T3(>15)      RH1(<=65)             8          4
 5 2006    50       12.7 T2(>8, <=15) RH1(<=65)             8          6
 6 2006    52.3     16.1 T3(>15)      RH1(<=65)             8          4
 7 2006    51.5     14.4 T2(>8, <=15) RH1(<=65)             8          6
 8 2006    82.8     15.1 T3(>15)      RH2(>65)              2          4
 9 2006    73.8     11.8 T2(>8, <=15) RH2(>65)              2          6
10 2006    47.1      9.5 T2(>8, <=15) RH1(<=65)             8          6

相关问题