R语言 在每列的value之后取n行

jei2mxaa  于 2023-04-18  发布在  其他
关注(0)|答案(3)|浏览(152)

我有一个 Dataframe dt

V1       V2       V3       V4       V5       V6
52  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
53  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
54 23.41610 27.74736  0.00000  0.00000  0.00000  0.00000
55 46.25229 26.80305 12.08680  0.00000  0.00000  0.00000
56 16.93179  0.00000 12.76963 12.21179  0.00000  0.00000
57  0.00000 24.35663  0.00000 15.47197 11.55125  0.00000
58 46.11487 14.91367  0.00000  0.00000 16.51914 12.40029
59 35.93963  0.00000  0.00000  0.00000 15.10201 13.44208
60  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000

对于每一列,我想找到第一个大于1的值,并选择它周围的(n-1:n+25)行,然后将它们放入一个新的数据表中。
我试过使用data.table

for (i in 1:ncol(df)) {df[i >1 | shift(i>1, n=1L, type = "lead") | shift(i>1, n=25L, type = "lag")]}

但显然我的专栏是错的
我尝试在相同的for循环结构中使用seq_along,只是为了获得25个“after”行:

output <- seq(min(which(df[i] > 1)), length.out = 25)

它只在满足阈值的情况下为第一列提供了一系列行号。
提前感谢您的帮助!

3zwjbxry

3zwjbxry1#

根据您的示例表,您是否需要这样的内容?出于演示目的,我只使用了target_row + 2而不是+ 25

sapply(1:ncol(df), \(x) {
  target_row <- min(which(df[, x] > 1))
  df[(target_row - 1):(target_row + 2), x]
  })

         [,1]     [,2]     [,3]     [,4]     [,5]     [,6]
[1,]  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
[2,] 23.41610 27.74736 12.08680 12.21179 11.55125 12.40029
[3,] 46.25229 26.80305 12.76963 15.47197 16.51914 13.44208
[4,] 16.93179  0.00000  0.00000  0.00000 15.10201  0.00000

输入

df <- structure(list(V1 = c(0, 0, 23.4161, 46.25229, 16.93179, 0, 46.11487, 
35.93963, 0), V2 = c(0, 0, 27.74736, 26.80305, 0, 24.35663, 14.91367, 
0, 0), V3 = c(0, 0, 0, 12.0868, 12.76963, 0, 0, 0, 0), V4 = c(0, 
0, 0, 0, 12.21179, 15.47197, 0, 0, 0), V5 = c(0, 0, 0, 0, 0, 
11.55125, 16.51914, 15.10201, 0), V6 = c(0, 0, 0, 0, 0, 0, 12.40029, 
13.44208, 0)), class = "data.frame", row.names = c("52", "53", 
"54", "55", "56", "57", "58", "59", "60"))
o2rvlv0m

o2rvlv0m2#

使用 * data.table *(本质上与benson的相同):

dt[, lapply(.SD, function(i){ 
  x <- min(which(i > 1))
  i[ (x - 1):(x + 2) ]
  })]

#          V1       V2       V3       V4       V5       V6
# 1:  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
# 2: 23.41610 27.74736 12.08680 12.21179 11.55125 12.40029
# 3: 46.25229 26.80305 12.76963 15.47197 16.51914 13.44208
# 4: 16.93179  0.00000  0.00000  0.00000 15.10201  0.00000
2g32fytz

2g32fytz3#

  • 如果 * 有很多行,并且我们要查找的行通常是在早期找到的,那么循环将比比较整个向量更有效:
find_first <- function(x) {
  i <- 1L
  n <- length(x)
  while (x[i] <= 1 && i <= n) i <- i + 1L
  i
}    
dt[, lapply(.SD, \(x) x[find_first(x) + (-1:2)])]
#          V1       V2       V3       V4       V5       V6
#       <num>    <num>    <num>    <num>    <num>    <num>
# 1:  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
# 2: 23.41610 27.74736 12.08680 12.21179 11.55125 12.40029
# 3: 46.25229 26.80305 12.76963 15.47197 16.51914 13.44208
# 4: 16.93179  0.00000  0.00000  0.00000 15.10201  0.00000

数据:

dt <- data.table(
  V1 = c(0, 0, 23.4161, 46.25229, 16.93179, 0, 46.11487, 35.93963, 0),
  V2 = c(0, 0, 27.74736, 26.80305, 0, 24.35663, 14.91367, 0, 0),
  V3 = rep(c(0, 12.0868, 12.76963, 0), c(3L, 1L, 1L, 4L)),
  V4 = rep(c(0, 12.21179, 15.47197, 0), c(4L, 1L, 1L, 3L)),
  V5 = c(0, 0, 0, 0, 0, 11.55125, 16.51914, 15.10201, 0),
  V6 = rep(c(0, 12.40029, 13.44208, 0), c(6L, 1L, 1L, 1L))
)

相关问题