我一直在尝试通过实现过滤器和连接操作来提高代码的速度。然而,仔细观察后,我发现这两种不同的方法对于同一个函数(在我的例子中是计算一个向量的平均值)返回的结果似乎略有不同
我正在查看时间序列数据,并希望计算基线(即在特定的时间范围内的某个响应值(Mean_Intensity)的平均值。我的简化数据看起来是这样的
time Neuron_ID Mean_Intensity DrugApp1
<dbl> <fct> <dbl> <dbl>
1 0 GLP_1_200nM_16_02_2023_p2_f1_Neuron_1 9.88 300
2 0 GLP_1_200nM_16_02_2023_p2_f1_Neuron_2 11.8 300
3 0 GLP_1_200nM_16_02_2023_p2_f1_Neuron_3 8.45 300
4 0 GLP_1_200nM_16_02_2023_p2_f1_Neuron_4 9.99 300
5 0 GLP_1_200nM_16_02_2023_p2_f1_Neuron_5 4.48 300
6 0 GLP_1_20nM_23_02_2023_p3_f1_Neuron_1 9.89 300
旧方法如下
df_baseline <- df %>%
filter(time >= (DrugApp1 - 260), time <= (DrugApp1 + 750)) %>%
group_by(Neuron_ID) %>%
mutate(F0 = mean(Mean_Intensity[time <= DrugApp1], na.rm = TRUE))
我的新代码实现筛选器
df_baseline_new <- df %>%
filter(time >= (DrugApp1 - 260), time <= DrugApp1) %>%
group_by(Neuron_ID) %>%
mutate(F0 = mean(Mean_Intensity, na.rm = TRUE))
然而,对于同一个变量,这两种方法似乎返回的F0值略有不同
例如,对于给定的Neuron_ID,这两个方法似乎分别返回2.906231或2.911889的F0值。
这个细微的差异让我认为这是由于mutate(F0 = mean(Mean_Intensity[time <= DrugApp1], na.rm = TRUE))
的长度time
与filter(time >= (DrugApp1 - 260), time <= (DrugApp1 + 750))
不同造成的。我认为这可能与<=/<操作符包含/排除一个额外的时间点有关,但我尝试了这些操作符的许多组合,无法使它们相同,我不确定如何检查mutate(F0 = mean(Mean_Intensity[time <= DrugApp1], na.rm = TRUE))
作用的时间范围。
I'm aware that filter removes NA values但是我整理了我的数据,df_baseline_new
没有丢失任何可以解释这一点的值。
就目前的情况而言,我认为新的代码更有可能是正确的;我可以看到在df_baseline_new
中过滤的time
点正如我所期望的那样-所以这不是太大的问题,因为它实质上更快(2.01s vs 340 s),但是我想知道是什么导致了这种情况发生。任何帮助将不胜感激。
编辑
一些重新创建该问题的示例代码:
structure(list(time = c(0, 0, 0, 10, 10, 10, 20, 20, 20, 30,
30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 70, 70, 70, 80, 80,
80, 90, 90, 90, 100, 100, 100, 110, 110, 110, 120, 120, 120,
130, 130, 130, 140, 140, 140, 150, 150, 150, 160, 160, 160, 170,
170, 170, 180, 180, 180, 190, 190, 190, 200, 200, 200, 210, 210,
210, 220, 220, 220, 230, 230, 230, 240, 240, 240, 250, 250, 250,
260, 260, 260, 270, 270, 270, 280, 280, 280, 290, 290, 290, 300,
300, 300, 310, 310, 310, 320, 320, 320, 330, 330, 330, 340, 340,
340, 350, 350, 350, 360, 360, 360, 370, 370, 370, 380, 380, 380,
390, 390, 390, 400, 400, 400, 410, 410, 410, 420, 420, 420, 430,
430, 430, 440, 440, 440, 450, 450, 450, 460, 460, 460, 470, 470,
470, 480, 480, 480, 490, 490, 490, 500, 500, 500, 510, 510, 510,
520, 520, 520, 530, 530, 530, 540, 540, 540, 550, 550, 550, 560,
560, 560, 570, 570, 570, 580, 580, 580, 590, 590, 590, 600, 600,
600, 610, 610, 610, 620, 620, 620, 630, 630, 630, 640, 640, 640,
650, 650, 650, 660, 660, 660, 670, 670, 670, 680, 680, 680, 690,
690, 690, 700, 700, 700, 710, 710, 710, 720, 720, 720, 730, 730,
730, 740, 740, 740, 750, 750, 750, 760, 760, 760, 770, 770, 770,
780, 780, 780, 790, 790, 790, 800, 800, 800, 810, 810, 810, 820,
820, 820, 830, 830, 830, 840, 840, 840, 850, 850, 850, 860, 860,
860, 870, 870, 870, 880, 880, 880, 890, 890, 890, 900, 900, 900,
910, 910, 910, 920, 920, 920, 930, 930, 930, 940, 940, 940, 950,
950, 950, 960, 960, 960, 970, 970, 970, 980, 980, 980, 990, 990,
990, 1000, 1000, 1000, 1010, 1010, 1010), Neuron_ID = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L), levels = c("POMC_GFP_GLP_1_50nM_23_02_23_p1_f1_Neuron_3",
"POMC_GFP_GLP_1_50nM_23_02_23_p1_f1_Neuron_4", "POMC_GFP_GLP_1_50nM_23_02_23_p1_f1_Neuron_5"
), class = "factor"), Mean_Intensity = c(17.592, 10.148, 2.753,
16.496, 9.785, 2.684, 18.887, 9.681, 2.78700000000001, 19.068,
10.181, 2.71300000000001, 19.463, 10.072, 2.86999999999999, 18.075,
9.795, 2.70699999999999, 17.515, 9.7, 2.74300000000001, 17.35,
9.55800000000001, 2.61200000000001, 18.055, 9.35899999999999,
2.67999999999999, 18.119, 9.11799999999999, 2.68599999999999,
18.136, 9.298, 2.715, 18.648, 9.54899999999999, 2.744, 18.41,
9.45399999999999, 2.866, 16.803, 9.75700000000001, 2.941, 18.081,
9.76000000000001, 2.64, 17.018, 9.574, 3.283, 19.086, 10.122,
2.98299999999999, 18.874, 9.97699999999999, 2.94799999999999,
18.416, 9.556, 3.178, 19.367, 9.62, 2.852, 19.236, 9.68599999999999,
2.875, 19.282, 9.76000000000001, 3.479, 20.024, 9.64100000000001,
3.15600000000001, 20.177, 9.85499999999999, 3.077, 20.53, 9.29900000000001,
3.096, 19.449, 9.595, 3.352, 17.926, 9.52499999999999, 3.20099999999999,
18.146, 9.398, 3.101, 17.706, 9.355, 2.952, 18.222, 9.41800000000001,
3, 19.932, 9.46600000000001, 2.941, 20.391, 9.47500000000001,
2.943, 19.975, 9.449, 2.86199999999999, 19.704, 9.63, 3.029,
20.318, 9.247, 2.711, 21.773, 9.613, 2.756, 21.757, 9.753, 2.78,
21.396, 9.396, 2.75, 21.837, 9.721, 2.89700000000001, 20.221,
9.387, 2.718, 20.905, 9.26400000000001, 2.80900000000001, 19.763,
9.50399999999999, 2.759, 20.885, 9.821, 2.949, 21.624, 9.411,
2.65400000000001, 21.694, 9.316, 3.167, 21.947, 9.56100000000001,
3.52000000000001, 23.746, 9.245, 3.14500000000001, 24.241, 9.366,
3.23099999999999, 23.875, 9.491, 3.28, 24.328, 9.361, 3.03700000000001,
23.937, 8.99600000000001, 2.78100000000001, 23.383, 9.083, 2.94200000000001,
23.297, 9.295, 3.29900000000001, 23.183, 9.20100000000001, 3.21300000000001,
22.493, 9.111, 3.24300000000001, 23.056, 9.09400000000001, 2.967,
23.406, 9.16199999999999, 3.226, 25.179, 9.277, 3.295, 25.679,
9.024, 3.134, 25.215, 8.986, 3.19199999999999, 24.382, 9.048,
3.28, 25.559, 9.33, 3.352, 25.122, 9.575, 3.27200000000001, 25.637,
9.303, 3.33, 23.172, 9.176, 3.31, 24.396, 9.349, 3.32300000000001,
23.88, 9.313, 3.321, 23.421, 9.304, 3.19, 24.431, 9.16399999999999,
3.22999999999999, 25.018, 9.01900000000001, 3.363, 25.232, 9.128,
3.53999999999999, 25.348, 9.206, 3.43900000000001, 25.399, 9.497,
3.298, 24.746, 9.069, 2.967, 25.702, 9.107, 3.26299999999999,
25.793, 9.36500000000001, 3.292, 25.703, 9.111, 3.148, 25.637,
9.38800000000001, 3.328, 25.759, 9.384, 3.42399999999999, 23.973,
9.53200000000001, 3.39400000000001, 23.623, 9.649, 3.497, 25.759,
9.78500000000001, 3.65700000000001, 25.619, 9.64400000000001,
3.28, 24.842, 9.813, 3.48400000000001, 24.342, 9.759, 3.55, 23.872,
9.711, 3.474, 21.907, 9.794, 3.26299999999999, 21.298, 9.56,
2.869, 21.291, 9.752, 3.227, 21.236, 9.76799999999999, 3.15299999999999,
21.887, 9.93000000000001, 3.169, 22.742, 9.848, 3.22, 22.095,
10.115, 3.622, 22.043, 10.031, 3.73, 21.19, 10.041, 3.548, 21.572,
9.992, 3.476, 23.337, 10.255, 3.67099999999999, 23.556, 10.142,
3.459, 22.683, 10.26, 3.56, 23.389, 10.289, 3.42999999999999,
23.686, 10.025, 3.15900000000001, 23.687, 10.147, 3.12700000000001
), DrugApp1 = c(300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300,
300, 300, 300, 300, 300, 300, 300, 300, 300, 300)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -306L), groups = structure(list(
Neuron_ID = structure(1:3, levels = c("POMC_GFP_GLP_1_50nM_23_02_23_p1_f1_Neuron_3",
"POMC_GFP_GLP_1_50nM_23_02_23_p1_f1_Neuron_4", "POMC_GFP_GLP_1_50nM_23_02_23_p1_f1_Neuron_5"
), class = "factor"), .rows = structure(list(c(1L, 4L, 7L,
10L, 13L, 16L, 19L, 22L, 25L, 28L, 31L, 34L, 37L, 40L, 43L,
46L, 49L, 52L, 55L, 58L, 61L, 64L, 67L, 70L, 73L, 76L, 79L,
82L, 85L, 88L, 91L, 94L, 97L, 100L, 103L, 106L, 109L, 112L,
115L, 118L, 121L, 124L, 127L, 130L, 133L, 136L, 139L, 142L,
145L, 148L, 151L, 154L, 157L, 160L, 163L, 166L, 169L, 172L,
175L, 178L, 181L, 184L, 187L, 190L, 193L, 196L, 199L, 202L,
205L, 208L, 211L, 214L, 217L, 220L, 223L, 226L, 229L, 232L,
235L, 238L, 241L, 244L, 247L, 250L, 253L, 256L, 259L, 262L,
265L, 268L, 271L, 274L, 277L, 280L, 283L, 286L, 289L, 292L,
295L, 298L, 301L, 304L), c(2L, 5L, 8L, 11L, 14L, 17L, 20L,
23L, 26L, 29L, 32L, 35L, 38L, 41L, 44L, 47L, 50L, 53L, 56L,
59L, 62L, 65L, 68L, 71L, 74L, 77L, 80L, 83L, 86L, 89L, 92L,
95L, 98L, 101L, 104L, 107L, 110L, 113L, 116L, 119L, 122L,
125L, 128L, 131L, 134L, 137L, 140L, 143L, 146L, 149L, 152L,
155L, 158L, 161L, 164L, 167L, 170L, 173L, 176L, 179L, 182L,
185L, 188L, 191L, 194L, 197L, 200L, 203L, 206L, 209L, 212L,
215L, 218L, 221L, 224L, 227L, 230L, 233L, 236L, 239L, 242L,
245L, 248L, 251L, 254L, 257L, 260L, 263L, 266L, 269L, 272L,
275L, 278L, 281L, 284L, 287L, 290L, 293L, 296L, 299L, 302L,
305L), c(3L, 6L, 9L, 12L, 15L, 18L, 21L, 24L, 27L, 30L, 33L,
36L, 39L, 42L, 45L, 48L, 51L, 54L, 57L, 60L, 63L, 66L, 69L,
72L, 75L, 78L, 81L, 84L, 87L, 90L, 93L, 96L, 99L, 102L, 105L,
108L, 111L, 114L, 117L, 120L, 123L, 126L, 129L, 132L, 135L,
138L, 141L, 144L, 147L, 150L, 153L, 156L, 159L, 162L, 165L,
168L, 171L, 174L, 177L, 180L, 183L, 186L, 189L, 192L, 195L,
198L, 201L, 204L, 207L, 210L, 213L, 216L, 219L, 222L, 225L,
228L, 231L, 234L, 237L, 240L, 243L, 246L, 249L, 252L, 255L,
258L, 261L, 264L, 267L, 270L, 273L, 276L, 279L, 282L, 285L,
288L, 291L, 294L, 297L, 300L, 303L, 306L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))
好吧,我的问题是由于DrugApp1
实际上是一个与我的 Dataframe 大小相同的数字向量。由于这是一个我在许多不同数据集上运行的脚本,具有不同的变量名称(我提取为DrugApp1String
),我有一段早期代码,可以将不同的变量名称转换为一个公共的DrugApp1
。我打算这样做:
df$DrugApp1 <- df[,c(DrugApp1_string)]
然而事实是
DrugApp1 <- df[,c(DrugApp1_string)]
这导致了不对齐的发生,我认为也导致了我的代码运行缓慢。非常非常烦人。
非常感谢您的帮助!
1条答案
按热度按时间tf7tbtn21#
您正在不同的数据集上运行代码,第一个块中的
df
,第二个块中的data_FINAL
。如果要检查差异,请在同一数据上运行两组代码,修改列名以便您可以区分哪些是哪些,并添加一些标志来标记每个均值中包含哪些行。就像这样:那你就可以加入比较了
这将为您提供包含在一个均值中但不包含在另一个均值中的任何行的结果。