R语言 在散点图中选择邻近观测点的观测值

lyr7nygr  于 2023-03-05  发布在  其他
关注(0)|答案(1)|浏览(117)

我有一个包含两个种群fr2100nr的数据集ind,其中种群中的每个个体都有一个唯一的编号,每个个体都有一个坐标值Dim.1Dim.2,如下图所示:

> ind <- get_pca_ind(res_acp)
> ind
Principal Component Analysis Results for individuals
 ===================================================
  Name       Description                       
1 "$coord"   "Coordinates for the individuals" 
2 "$cos2"    "Cos2 for the individuals"        
3 "$contrib" "contributions of the individuals"

# isolate the population 'fr2100'
> fr2100 <- ind$coord[substr(rownames(ind$coord), 1, 7) == 'fr2100_', ]
> str(fr2100)
'data.frame':   6873 obs. of  3 variables:
 $ rowname: chr  "fr2100_72" "fr2100_73" "fr2100_74" "fr2100_75" ...
 $ Dim.1  : num  1.37 1.3 1.25 1.25 1.18 ...
 $ Dim.2  : num  -1.249 -1.028 -0.835 -0.624 -0.483 ...

# isolate the population 'nr'
> nr <- ind$coord[substr(rownames(ind$coord), 1, 3) == 'nr_', ]
> str(nr)
'data.frame':   4897 obs. of  3 variables:
 $ rowname: chr  "nr_174" "nr_175" "nr_176" "nr_177" ...
 $ Dim.1  : num  -3.74 -3.44 -3.26 -2.97 -3.88 ...
 $ Dim.2  : num  1.26 1.55 1.7 1.91 1.3 ...

**我的问题:**我试图理解我如何能够在fr2100的6873个个体中,从nr的4897个个体中,只选择在大约0.01的距离处具有Dim.1 AND Dim.2的个体,这些个体表示在这个点云中:

换句话说,可以在单个nr的周长(在0.01处)内的每个单个fr2100,如这里理论上表示的

我对任何答案都感兴趣。如果需要,我可以提供更多的信息。提前感谢你。

gc0ot86w

gc0ot86w1#

我想distance_semi_join()fuzzyjoin package是一个非常直接和简洁的方式来过滤欧几里德距离。其他的变体如distance_left_join()也值得考虑,因为它们将在结果 Dataframe 中提供一个可选的距离变量。

library(fuzzyjoin)
library(ggplot2)

# example datasets
set.seed(1)
nr <- data.frame(rowname = paste0("nr_", 1:100), Dim.1 = rnorm(100, -0.05, 0.03), Dim.2 = rnorm(100, 0, 0.02))
fr <- data.frame(rowname = paste0("fr_", 1:100), Dim.1 = rnorm(100,  0.05, 0.03), Dim.2 = rnorm(100, 0, 0.02))

# fr points within distance of closest nr point:
fr_in_dist <- distance_semi_join(fr, nr, 
                                 by = c("Dim.1","Dim.2"), 
                                 max_dist=0.01)

fr_in_dist
#>    rowname        Dim.1         Dim.2
#> 5     fr_5 -0.018557066  3.308291e-02
#> 14   fr_14  0.008893764 -1.311564e-02
#> 18   fr_18  0.012401307 -2.420202e-03
#> 25   fr_25  0.015302829  9.640590e-03
#> 28   fr_28  0.001834598  3.409789e-03
#> 32   fr_32 -0.036667620 -3.138164e-02
#> 38   fr_38  0.014406241  8.797409e-05
#> 46   fr_46 -0.010004948 -2.817701e-02
#> 57   fr_57 -0.022092886 -2.347154e-02
#> 68   fr_68  0.014326601  1.135904e-02
#> 77   fr_77 -0.018673719  2.577108e-03
#> 79   fr_79  0.010512645 -3.278219e-03
#> 84   fr_84  0.028963050  3.286837e-03
#> 86   fr_86  0.019967835 -1.130428e-03
#> 94   fr_94  0.007212280  6.132097e-03

ggplot() +
  geom_point(data = nr, aes(x = Dim.1, y = Dim.2, color = "nr"))+
  geom_point(data = fr, aes(x = Dim.1, y = Dim.2, color = "fr"))+
  geom_point(data = fr_in_dist, aes(x = Dim.1, y = Dim.2), shape = 1, size = 5 )+
  coord_fixed() +
  theme_bw()

最初的答案是关于单个参考点与点的可能性,在这个dist() from base中也相当直接:

library(ggplot2)

# sample data, add point fr2100_xx that would fall outside of the perimeter
df <- read.csv(text = "rowname, Dim.1, Dim.2
fr2100_72, 0.003810163, 0.006935450
fr2100_73, 0.003433946, 0.004698691
fr2100_74, 0.003168248, 0.003097222
fr2100_xx, 0.015, 0.015")

# nr and threshold distance
nr <- c(0.0035, 0.005)
thr_dist <- 0.01

# insert nr point to first position to use it in distance matrix calculation
dist_m <- rbind(c(0.0035, 0.005),df[,c("Dim.1", "Dim.2")]) |> dist() |> as.matrix()

# distances: 
as.dist(dist_m)
#>              1            2            3            4
#> 2 0.0019601448                                       
#> 3 0.0003084643 0.0022681777                          
#> 4 0.0019314822 0.0038915356 0.0016233602             
#> 5 0.0152397507 0.0137930932 0.0154884012 0.0167829223

# extract first column, distnaces from point "nr" ([1,1] = 0)
df$dist <-dist_m[-1,1]
# flag points that fall outside of the perimeter
df$in_dist = df$dist <= thr_dist
df
#>     rowname       Dim.1       Dim.2         dist in_dist
#> 1 fr2100_72 0.003810163 0.006935450 0.0019601448    TRUE
#> 2 fr2100_73 0.003433946 0.004698691 0.0003084643    TRUE
#> 3 fr2100_74 0.003168248 0.003097222 0.0019314822    TRUE
#> 4 fr2100_xx 0.015000000 0.015000000 0.0152397507   FALSE

即-https://i.imgur.com/jiqHmXn.png

相关问题