在R中使用terra和sf:为什么我得到了不合逻辑的距离测量值?

u5rb5r59  于 2022-12-20  发布在  其他
关注(0)|答案(1)|浏览(187)

我正在使用terra来获得边界多边形内点之间的“曲线”距离,并将其与忽略多边形的直线距离进行比较。我得到的结果没有意义,我希望你们都能帮助我弄清楚发生了什么。
我们首先加载德克萨斯州第114届国会使用的美国国会Map:

texas = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/texascongressmaps")
ggplot() + geom_sf(data = texas$geometry)

我们还创建了一些存储对象:

longest.dist.district.straight = rep(NA, 36)
longest.dist.district.curved = rep(NA, 36)

然后,我们一个区一个区地去(n = 36)。对于每个区域,我们在该区域的多边形内抽取100个随机点的样本。然后,我们询问“100个点中任意两个点之间的最长直线距离是多少?”然后,我们将多边形光栅化,对其进行遮罩,并逐点进行,询问“这个点距离所有其他点有多远,假设我们不能走出多边形?”这意味着我们有时必须在多边形内弯曲才能到达两点之间。我们找到任意两点之间的最长距离。然后比较直线和曲线方法。假设曲线方法总是要长一些...

for(c in 1:36) { #Texas had 36 districts.
if(c %% 3 == 0) {print(c)} # Progress bar

this.district = texas[c, ] #Get the current district

#We'll get a sample of 100 randomly placed points around the district.
rand.ptsDistrict = sf::st_sample(this.district,
size = 100,
type = 'random',
exact = TRUE)

#What's the max straight-line distance between any two points?
longest.dist.district.straight[c] = max(sf::st_distance(rand.ptsDistrict))

#Now, calculate our 'as the politician would walk' distances (aka curvy distances). We need to do this for each of our 100 points separately, with each as the target point in turn, and save the longest value we get...
current.raster = terra::ext(this.district) # Rasterizing
current.raster = terra::rast(current.raster,
nrow=100, ncol=100,
crs = crs(this.district),
vals = 1)
current.raster = terra::mask(current.raster, # Masking
terra::vect(this.district),
updatevalue = NA)
point.locs = terra::cellFromXY(current.raster, # Getting point locations in the new grid
sf::st_coordinates(rand.ptsDistrict))

longest.dists.i = rep(NA, 100) # Storage object
for(i in 1:100) {
point.i.loc = cellFromXY(current.raster, #Focal point this time.
st_coordinates(rand.ptsDistrict[i]))
point.noni.loc = cellFromXY(current.raster, #All other points
st_coordinates(rand.ptsDistrict[-i]))
terra::values(current.raster)[point.i.loc] = 2 # Make focal point the target value
all.dists = terra::gridDistance(current.raster, #Get all distances to the target value
target = 2, scale = 1)
longest.dists.i[i] = max(values(all.dists)[point.noni.loc], na.rm=TRUE) # Find the longest of these for this point and store it.
terra::values(current.raster)[point.i.loc] = 1
}
longest.dist.district.curved[c] = max(longest.dists.i) # Find the longest curved distance between any two points in the current district.
}

当我这样做的时候,我总是得到直线距离,严格地比同一区域的曲线距离长,这在逻辑上是没有意义的--两点之间的直线怎么可能比它们之间的曲线长呢?

> (cbind(longest.dist.district.straight, longest.dist.district.curved))
      longest.dist.district.straight longest.dist.district.curved
 [1,]                      239285.77                    121703.64
 [2,]                       63249.88                     48238.89
 [3,]                       49495.09                     24823.91
 [4,]                      290542.38                    147894.80
 [5,]                      213758.13                    108663.63
 [6,]                      129261.83                     68351.77
 [7,]                       36705.18                     22081.22
 [8,]                      165759.58                     87749.33
 [9,]                       38317.61                     19903.54
[10,]                      196211.38                    100959.66
[11,]                      505130.81                    261479.58
[12,]                       79502.87                     45134.11
[13,]                      604901.43                    313317.24
[14,]                      201724.57                    115286.81
[15,]                      414257.14                    208204.75
[16,]                       61867.34                     32115.77
[17,]                      193198.96                    103829.75
[18,]                       41693.26                     26462.02
[19,]                      433902.07                    225041.00
[20,]                       32201.45                     17060.41
[21,]                      212300.45                    119597.54
[22,]                       88143.49                     46720.59
[23,]                      777236.95                    394663.54
[24,]                       39692.06                     21192.98
[25,]                      299336.81                    153871.46
[26,]                       65901.64                     35200.83
[27,]                      272822.43                    158724.70
[28,]                      362477.84                    205297.74
[29,]                       40210.19                     30094.43
[30,]                       44693.37                     23430.33
[31,]                       93781.16                     50340.85
[32,]                       38941.81                     21047.40
[33,]                       52395.85                     31169.46
[34,]                      394586.71                    206545.50
[35,]                      138182.61                     73556.10
[36,]                      223351.15                    112601.38

我只能猜测我要么把代码弄乱了,要么发现了一个bug。请帮助!谢谢!
编辑:我刚刚注意到,在发布这个之后,如果我把弯曲距离乘以2,我会得到可信的值(弯曲距离总是更长,但数量可变)--但是我看不出需要这样做的编码理由......其他人能看到我遗漏的一个吗?

ccrfmcuu

ccrfmcuu1#

您正在比较最短距离(对于那些没有见过乌鸦飞行的人来说,“像乌鸦一样飞”)与格网距离(从一个像元的中心移动到相邻像元的中心),只允许使用落在一个地区内的格网像元。
当我运行您的代码的压缩版本时,我看到距离非常相似,格网距离总是更长,这是应该的,除了14区,因为该区不连续。

library(terra)
#terra 1.6.47

texas <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/texascongressmaps")
tex <- vect(texas)

# generate random points
set.seed(0)
b <- spatSample(tex[, "DISTRICT"], size = 100, method="random", strata=1:nrow(tex))

# max distance between any two random points by district.
pdist <- sapply(tex$DISTRICT, \(i) max( distance( b[b$DISTRICT == i, ])) )

# max grid distance between any two random points by district.
pgrid <- rep(NA, nrow(tex))
for (i in 1:nrow(tex)) {
    r <- rast(tex[i,], nrow=100, ncol=100)
    r <- rasterize(tex[i,], r)
    xy <- crds(b[b$DISTRICT==i, ])
    cells <- cellFromXY(r, xy)
    maxdists <- rep(NA, 100)
    for(j in 1:100) {
        r[cells[j]] <- 2
        dists <- gridDist(r, target=2)
        # Find the longest of these for this point
        maxdists[j] <- max( dists[ cells[-j] ], na.rm=TRUE)
        r[cells[j]] <- 1
    }
    pgrid[i] <- max(maxdists) 
}

结果看起来不错:

head(cbind(pdist, pgrid))
#      pdist     pgrid
#1 217746.46 223906.22
#2  61707.87  99422.07
#3  50520.61  51479.98
#4 282744.13 293656.59
#5 196074.08 202014.45
#6 120913.60 126532.72

plot(pdist, pgrid)
abline(0, 1, col="red")

如果您的结果不同,您可能使用的是旧版本的“terra”?我假设您是因为您使用的是gridDistance,它的工作与警告,因为它在当前版本中被重命名为gridDist
您为每个地区使用不同的格网像元大小。我不知道您的目标是什么,但对整个德克萨斯州使用单个模板栅格可能更合理。您可以执行以下操作

# outside the loop
rr <- rast(tex, res=1/60, vals=1)
# inside the loop
r <- crop(rr, tex[i,], mask=TRUE)

相关问题