根据R中特定行的平均值创建新列[重复]

aiqt4smr 于 2023-01-22 发布在其他

关注(0)|答案(2)|浏览(98)

- 此问题在此处已有答案**：

Filter R dataframe to n most frequent cases and order by frequency（2个答案）
（9个答案）
昨天关门了。
我有这样的表，并希望计数出现最多的基因（让我们说前10个基因），然后找出这些前10个基因的tail_len的平均值。
| | 基因|尾部_镜头|
| - ------|- ------|- ------|
| 1个|SPAC20G4.06c|三个|
| 第二章|SPCC613.06|五个|
| 三个|SPAC6F6.03c|第二章|
| 四个|SPAC20G4.06c|三个|
| 五个|SPBC23G7.15c|五个|
| 六个|SPAC589.10c|第二章|
| 七|SPBC23G7.15c|三个|
| 八个|SPAC22H12.04c|1个|
| 九|SPAC22H12.04c|十二|
| 十个|SPAC6G10.11c|八个|
| 十一|SPAC589.10c|三十一|
| 十二|SPBC18E5.06|十六|

来源：https://stackoverflow.com/questions/75188834/create-new-column-based-on-mean-of-values-that-are-found-in-specific-rows-in-r

2条答案

按热度按时间

3htmauhk1#

没有更大的数据集很坚韧进行测试，但这是一种使用dplyr的方法：

library(tidyverse)

foo <- tibble(
  gene = c("SPAC20G4.06c", "SPCC613.06", "SPAC20G4.06c", "SPAC6F6.03c", "SPBC23G7.15c"), 
  tail_len = c(3, 5, 1, 6, 7)
)

foo_top10 <- foo %>%
  group_by(gene) %>%
  summarize(count = n()) %>%
  slice_max(count, n = 10)

foo %>% 
  filter(gene %in% foo_top10$gene) %>%
  group_by(gene) %>%
  summarize(tail_len_mean = mean(tail_len))

赞(0）回复(0）举报 2023-01-22

2ul0zpep2#

这里是slice_max的一种方法。我定义了两个变量，ties_ok和max_n。后者被设置为3来测试代码，你需要max_n <- 110，如果你想放弃平局，只保留找到的前几行，前者可以被设置为FALSE。

df1 <- "    gene    tail_len
1   SPAC20G4.06c    3
2   SPCC613.06  5
3   SPAC6F6.03c     2
4   SPAC20G4.06c    3
5   SPBC23G7.15c    5
6   SPAC589.10c     2
7   SPBC23G7.15c    3
8   SPAC22H12.04c   1
9   SPAC22H12.04c   12
10  SPAC6G10.11c    8
11  SPAC589.10c     31
12  SPBC18E5.06     16"
df1 <- read.table(text = df1, header = TRUE)

suppressPackageStartupMessages(
  library(dplyr)
)

ties_ok <- TRUE
#ties_ok <- FALSE
max_n <- 3L
df1 %>%
  group_by(gene) %>%
  summarise(count = n(), mean_tail_len = mean(tail_len)) %>%
  slice_max(count, n = max_n, with_ties = ties_ok) %>%
  select(-count)
#> # A tibble: 4 × 2
#>   gene          mean_tail_len
#>   <chr>                 <dbl>
#> 1 SPAC20G4.06c            3  
#> 2 SPAC22H12.04c           6.5
#> 3 SPAC589.10c            16.5
#> 4 SPBC23G7.15c            4

创建于2023年1月20日，使用reprex v2.0.2

赞(0）回复(0）举报 2023-01-22

我来回答

根据R中特定行的平均值创建新列[重复]

2条答案

相关问题

热门标签

最新问答