R列中某些值的离群值

nhn9ugyo  于 2022-12-05  发布在  其他
关注(0)|答案(3)|浏览(106)

离群值数据
给定数据:

Color  |   Number
Green   |  5.0 
Red     |  20.0
Green   |  5.0    
Green   |  15.0
Green   |  100.0
Red     |  7.0
Red     |  10.0
Red     |  8.0
Green   |  6.0


想只取“绿色”的数值,然后绘制并找到它们的离群值。你怎么做?

cyej8jka

cyej8jka1#

我们可以subset数据集,其中Color"Green"select为“数字”列,使用boxplot并提取out数据层

boxplot(subset(Data, Color == "Green", select = Number)$Number)$out
[1] #100

数据

Data <- structure(list(Color = c("Green", "Red", "Green", "Green", "Green", 
"Red", "Red", "Red", "Green"), Number = c(5L, 20L, 5L, 15L, 100L, 
7L, 10L, 8L, 6L)), class = "data.frame", row.names = c(NA, -9L
))
qcuzuvrc

qcuzuvrc2#

获取Green颜色的值:
GreenValues = Data[Data$Color=='Green',]
然后使用boxplot.stats获得离群值:
boxplot.stats(GreenValues$Number)$out
希望能有所帮助。

7d7tgy0s

7d7tgy0s3#

有趣的是,不久前我也需要做同样的事情。在路上我发现了一个很棒的视频,解释了这一点:https://youtu.be/9aDHbRb4Bf8
下面是我用youtube视频制作的一个JS函数:

function getOutliers(input) {
  // sort array ascending
  const asc = arr => arr.sort((a, b) => a - b);

  const sum = arr => arr.reduce((a, b) => a + b, 0);

  const mean = arr => sum(arr) / arr.length;

  // sample standard deviation
  const std = (arr) => {
    const mu = mean(arr);
    const diffArr = arr.map(a => (a - mu) ** 2);
    return Math.sqrt(sum(diffArr) / (arr.length - 1));
  };

  const quantile = (arr, q) => {
    const sorted = asc(arr);
    const pos = (sorted.length - 1) * q;
    const base = Math.floor(pos);
    const rest = pos - base;
    if (sorted[base + 1] !== undefined) {
      return sorted[base] + rest * (sorted[base + 1] - sorted[base]);
    } else {
      return sorted[base];
    }
  };

  const q1 = quantile(input, .25);
  const q3 = quantile(input, .75);
  const range = q3 - q1;
  const min = q1 - 1.5 * range;
  const max = q3 + 1.5 * range;

  let smallOutlierIndexes = [];
  let smallOutliers = [];
  let largeOutlierIndexes = [];
  let largeOutliers = [];

  for (let i = 0; i < input.length; i++) {
    if (input[i] > max) {
      largeOutlierIndexes.push(i);
      largeOutliers.push(input[i]);
    }
    if (input[i] < min) {
      smallOutlierIndexes.push(i);
      smallOutliers.push(input[i]);
    }
  }

  return { smallOutliers, smallOutlierIndexes, largeOutliers, largeOutlierIndexes, min, max };
}

相关问题