R语言 修复更新和检查更新数据条件的循环

mm9b1k5b  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(83)

我一直在尝试写一个代码,检查如果条件a[i-1]和a[i],将更新b[i]与b[i-1]和c[i]的值,如果条件失败,那么b[i]必须更新为0
我现在的代码是:

#R
library(dplyr)
update_b <- function(data) {
  for (i in 2:nrow(data)) {
    if (!is.na(data$a[i]) & !is.na(data$a[i-1]) & data$a[i] < 60 & data$a[i-1] < 60) {
      data$b[i] <- data$c[i] + data$b[i-1]
    } else {
      data$b[i] <- 0
    }
  }
  return(data)
}

result <- data_frame %>%
  group_by(number) %>%
  arrange(date) %>%
  do(update_b(.))

字符串
它一直运行到:

|=============                                                             | 13% \~40 s remaining  
Error in `$<-`:
! Assigned data `*vtmp*` must be compatible with existing data.
x Existing data has 1 row.
x Assigned data has 2 rows.
i Row updates require a list value. Do you need `list()` or `as.list()`?
Caused by error in `vectbl_recycle_rhs_rows()`:
! Can't recycle input of size 2 to size 1.


之前我一直在尝试使用data.table来解决这个问题:

#R
library(data.table)
calculate_b <- function(x) {
  for (i in 2:nrow(x)) {
    if (x[i, a] < 60 & x[i - 1, a] < 60) {
      x[i, b:= x[i, c] + x[i - 1, b]]
    } else {
      x[i, b:= 0]
    }
  }
  return(x)
}

a[, b:= 0]
a <- a[, calculate_b(.SD), by = number]


给了我一个错误

.SD is locked. Using := in .SD's j is reserved for possible future use; a tortuously flexible way to modify by group. Use := in j directly to modify by group by reference.


如何解决这个错误?
编辑:这是数据样本
| ID(编号)|一|C| B(开始时设置为0)|
| --|--|--|--|
| 123 | 30 | 0 | 0 |
| 123 | 25 | 45 | 45 |
| 123 | 18 | 8 | 53 |
| 123 | 80 | 15 | 0 |
| 123 | 45 | 63 | 0 |
| 123 | 15 | 75 | 75 |
| 123 | 70 | 12 | 0 |
| 456 | 65 | 0 | 0 |
| 456 | 45 | 75 | 0 |
| 456 | 30 | 26 | 26 |
| 456 | 58 | 95 | 121 |
| 456 | 53 | 41 | 162 |
| 456 | 50 | 32 | 194 |
| 789 | 45 | 0 | 0 |
| 789 | 90 | 14 | 0 |
| 789 | 89 | 65 | 0 |
| 789 | 75 | 78 | 0 |
| 789 | 80 | 59 | 0 |
| 789 | 50 | 32 | 0 |

ugmeyewa

ugmeyewa1#

尝试#3:-)
我们将预先计算b应该被重置为0的点(称之为reset),然后通过这个reset变量的召唤重复(使用data.table::rleid)进行分组。

DT[, reset := is.na(a) | shift(is.na(a), fill=FALSE) | a >= 60 | shift(a >= 60, fill=FALSE), by = .(ID)
  ][, grp := rleid(reset), by = .(ID)
  ][, b := if (reset[1]) 0L else cumsum(c), by = .(ID, grp)]
#        ID     a     c expect  reset   grp     b
#     <int> <int> <int>  <int> <lgcl> <int> <int>
#  1:   123    30     0      0  FALSE     1     0
#  2:   123    25    45     45  FALSE     1    45
#  3:   123    18     8     53  FALSE     1    53
#  4:   123    80    15      0   TRUE     2     0
#  5:   123    45    63      0   TRUE     2     0
#  6:   123    15    75     75  FALSE     3    75
#  7:   123    70    12      0   TRUE     4     0
#  8:   456    65     0      0   TRUE     1     0
#  9:   456    45    75      0   TRUE     1     0
# 10:   456    30    26     26  FALSE     2    26
# 11:   456    58    95    121  FALSE     2   121
# 12:   456    53    41    162  FALSE     2   162
# 13:   456    50    32    194  FALSE     2   194
# 14:   789    45     0      0  FALSE     1     0
# 15:   789    90    14      0   TRUE     2     0
# 16:   789    89    65      0   TRUE     2     0
# 17:   789    75    78      0   TRUE     2     0
# 18:   789    80    59      0   TRUE     2     0
# 19:   789    50    32      0   TRUE     2     0

字符串
我在输出中保留了resetgrp,只是为了显示它们的值,只需用DT[, c("reset","grp") := NULL]删除它们。
数据来自您的问题,为简单起见已重命名:

DT <- data.table::as.data.table(structure(list(ID = c(123L, 123L, 123L, 123L, 123L, 123L, 123L, 456L, 456L, 456L, 456L, 456L, 456L, 789L, 789L, 789L, 789L, 789L, 789L), a = c(30L, 25L, 18L, 80L, 45L, 15L, 70L, 65L, 45L, 30L, 58L, 53L, 50L, 45L, 90L, 89L, 75L, 80L, 50L), c = c(0L, 45L, 8L, 15L, 63L, 75L, 12L, 0L, 75L, 26L, 95L, 41L, 32L, 0L, 14L, 65L, 78L, 59L, 32L), expect = c(0L, 45L, 53L, 0L, 0L, 75L, 0L, 0L, 0L, 26L, 121L, 162L, 194L, 0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, -19L), class = c("data.table", "data.frame")))

相关问题