双循环在许多列中迭代以查找R中的离群值

inkz8wg9  于 2023-04-03  发布在  其他
关注(0)|答案(1)|浏览(122)

我有一个数据框,其中包含一个人的“id”和两个特征(“x”e“y”),如下所示:

id = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10","A11","A12","A13","A14","A15","A16","A17","A18","A19","A20","A21","A22","A23","A24")
x = c(10,4,6,8,9,8,7,6,12,14,11,9,8,4,5,10,14,12,15,7,10,14,24,28)
y = c(1.5,1.2,5,2,0.8,4,1,1.1,1.2,1.4,1.3,1.6,0.9,0.8,1,1.1,1.3,1.5,1.2,1.1,1,1.2,1.1,1)
a = data.frame(id,x,y)

我希望有一个循环来迭代每个特质和每个个体,这样我就可以创建一个新的数据框(或a的新列),其中如果个体是离群值,则为1,如果不是,则为0。将离群值视为偏离特质均值± 3 sd的任何点。
在本例中,“x”的离群值为28,“y”的离群值为5。所需的结果可能如下所示:

id = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10","A11","A12","A13","A14","A15","A16","A17","A18","A19","A20","A21","A22","A23","A24")
x_out = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1)
y_out = c(0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
a_out = data.frame(id, x_out, y_out)

你知道如何在循环中实现吗?我的想法是,如果我包含新的特征或个体,我不需要改变循环。谢谢!

rsl1atfo

rsl1atfo1#

不需要循环,你可以一次测试所有列的绝对z得分(abs(scale()))是否为>= 3

a_out <- a
a_out[, -1] <- as.integer(abs(scale(a[, -1])) >= 3)
#> a_out
    id x y
1   A1 0 0
2   A2 0 0
3   A3 0 1
4   A4 0 0
5   A5 0 0
6   A6 0 0
7   A7 0 0
8   A8 0 0
9   A9 0 0
10 A10 0 0
11 A11 0 0
12 A12 0 0
13 A13 0 0
14 A14 0 0
15 A15 0 0
16 A16 0 0
17 A17 0 0
18 A18 0 0
19 A19 0 0
20 A20 0 0
21 A21 0 0
22 A22 0 0
23 A23 0 0
24 A24 1 0

或者使用dplyr:

library(dplyr)

a_out <- a %>% 
  mutate(across(!id, \(x) as.integer(abs(scale(x)) >= 3)))
# same output as above

相关问题