我有一个数据框,其中包含一个人的“id”和两个特征(“x”e“y”),如下所示:
id = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10","A11","A12","A13","A14","A15","A16","A17","A18","A19","A20","A21","A22","A23","A24")
x = c(10,4,6,8,9,8,7,6,12,14,11,9,8,4,5,10,14,12,15,7,10,14,24,28)
y = c(1.5,1.2,5,2,0.8,4,1,1.1,1.2,1.4,1.3,1.6,0.9,0.8,1,1.1,1.3,1.5,1.2,1.1,1,1.2,1.1,1)
a = data.frame(id,x,y)
我希望有一个循环来迭代每个特质和每个个体,这样我就可以创建一个新的数据框(或a的新列),其中如果个体是离群值,则为1,如果不是,则为0。将离群值视为偏离特质均值± 3 sd的任何点。
在本例中,“x”的离群值为28,“y”的离群值为5。所需的结果可能如下所示:
id = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10","A11","A12","A13","A14","A15","A16","A17","A18","A19","A20","A21","A22","A23","A24")
x_out = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1)
y_out = c(0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
a_out = data.frame(id, x_out, y_out)
你知道如何在循环中实现吗?我的想法是,如果我包含新的特征或个体,我不需要改变循环。谢谢!
1条答案
按热度按时间rsl1atfo1#
不需要循环,你可以一次测试所有列的绝对z得分(
abs(scale())
)是否为>= 3
:或者使用dplyr: