R中的序列值迭代

68bkxrlz  于 2023-03-15  发布在  其他
关注(0)|答案(1)|浏览(110)

我目前正在阅读X1 E0 F1 X。特别是在第29页,他提出了玩具制造商的例子,有两个不同的策略1和2。每个策略都有转移概率矩阵和回报矩阵。

# Set up initial variables for Policy 1
P1 <- matrix(c(0.5, 0.5, 0.4, 0.6), nrow = 2, byrow = TRUE);rownames(P1)=c("1","2");P1
R1 <- matrix(c(9, 3, 3, -7), nrow = 2, byrow = TRUE);rownames(R1)=c("1","2");R1

# Set up initial variables for Policy 2
P2 <- matrix(c(0.8, 0.2, 0.7, 0.3), nrow = 2, byrow = TRUE);rownames(P2)=c("1","2");P2
R2 <- matrix(c(4, 4, 1, -19), nrow = 2, byrow = TRUE);rownames(R2)=c("1","2");R2

现在根据其策略绑定这4个矩阵:

library(dplyr)

P = as_tibble(rbind(P1,P2))%>%
  mutate(policy = rep(c(1,2),2))%>%
  arrange(policy)%>%
  rename(p1 =V1,p2 = V2);P
  
R = as_tibble(rbind(R1,R2))%>%
  mutate(policy2 = rep(c(1,2),2))%>%
  arrange(policy2)%>%
  rename(r1 =V1,r2 = V2);R
cbind(P,R)%>%
  select(-policy2)%>%
  relocate(.,policy,.after = r2)%>%
  group_by(policy)

结果是:

# A tibble: 4 × 5
# Groups:   policy [2]
     p1    p2    r1    r2 policy
  <dbl> <dbl> <dbl> <dbl>  <dbl>
1   0.5   0.5     9     3      1
2   0.8   0.2     4     4      1
3   0.4   0.6     3    -7      2
4   0.7   0.3     1   -19      2

q_{I}^k = \和_{j=1}^{N}p_{ij}^k r_{ij}^k
对于k = 1,2,其中k是两个策略,Sctuly是每个策略的P*R矩阵的逐元素行和。
我想实现他书中描述的顺序值迭代,一般来说我想从这个数据框架中计算出下图所示的3. 3方程,理想的输出是下图所示的表格。

在R里怎么做?

编辑(附加注解)

对于第19页所述的一个策略,如下所示

# Set up initial variables
> P = matrix(c(0.5, 0.5, 0.4, 0.6), nrow = 2, byrow = TRUE)
> R = matrix(c(9, 3, 3, -7), nrow = 2, byrow = TRUE)
> Q = rowSums(P*R)
> n = 5
> v = matrix(0, nrow = 2, ncol = n+1) # Initialize v(0) to be all zeros
> v[,1] = 0 # Initialize v(0) to be all zeros
> # total-value vector.
> # Calculate values for different n using matrix calculations
> for (i in 1:n) {
+   v[,i+1] = Q + P %*% v[,i]
+ }
> print(v)
     [,1] [,2] [,3]  [,4]   [,5]    [,6]
[1,]    0    6  7.5  8.55  9.555 10.5555
[2,]    0   -3 -2.4 -1.44 -0.444  0.5556
6kkfgxo0

6kkfgxo01#

使用一点递归。虽然你可以使用for循环或者甚至reduce:

q1 <- unname(rowSums(P1*R1))
q2 <- unname(rowSums(P2*R2))

v <- function(x, n=0){
  if(n == 0)  x
  else cbind(x, v(pmax(q1 + P1 %*% x, q2 + P2 %*% x), n-1))
}

v(c(0,0), 4)
  x              
1 0  6  8.2 10.22
2 0 -3 -1.7  0.23

使用for循环:

n <- 5
V <- matrix(0,2,n)

for (i in seq.int(2, n)){
  V[,i] <- pmax(q1 + P1 %*% V[,i-1], q2 + P2 %*% V[,i-1])
}
V

    [,1] [,2] [,3]  [,4]   [,5]
[1,]    0    6  8.2 10.22 12.222
[2,]    0   -3 -1.7  0.23  2.223

编辑:
使用tidyverse:

library(tidyverse)
seq_len(n) %>%
   set_names(str_c("v", .))%>%
   accumulate(~c(pmax(q1 + P1%*%.x, q2 + P2%*%.x)), .init = c(0,0))%>%
   bind_cols()
# A tibble: 2 × 6
  .init    v1    v2     v3    v4    v5
  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0     6   8.2 10.2   12.2  14.2 
2     0    -3  -1.7  0.230  2.22  4.22

相关问题