生成随机 Dataframe (5行9列),每个行和应为9

dphi5xsq  于 2023-09-27  发布在  其他
关注(0)|答案(3)|浏览(98)

我正在尝试创建10个或更多的伪嵌套。数据框dim应该是9列,5行(Mon,Tue,Wed,Thur,Fri),**每个rowsum应该是9。**如下所示。

Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
Mon       2       1       0       2       0       0       1       1       2
Tue       1       1       1       1       0       0       2       1       2
Wed       2       1       0       2       1       1       1       1       0
Thu       0       0       1       1       3       0       2       2       0
Fri       1       0       0       1       1       0       2       2       2

请问如何生成多个满足条件的 Dataframe ?

py49o6xq

py49o6xq1#

下面是一个函数,它将根据您的规格生成随机矩阵。

GenDF = function() {
    M = matrix(0, nrow=5, ncol=9)
    for(i in 1:5) {
        S = sample(9,9,replace=T)
        for(j in S) { M[i,j] = M[i,j] + 1 }
    }
    rownames(M) = c('Mon', 'Tue', 'Wed', 'Thu','Fri')
    colnames(M) = paste('Factor', 1:9, sep='')
    as.data.frame(M)
}

GenDF()
    Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
Mon       3       3       1       1       0       0       0       0       1
Tue       3       1       0       1       0       2       0       2       0
Wed       1       0       1       1       0       1       2       1       2
Thu       1       2       0       1       1       1       3       0       0
Fri       0       1       1       2       2       0       0       3       0

要详细说明为什么行的总和为1:行S = sample(9,9,replace=T)将在1和9 * 之间选择9个数字,并进行替换 *。其思想是,所选数字中的每一个代表将分布在九列中的九个项目中的一个。选择的数字告诉您它将进入哪一列。由于选择是通过替换进行的,因此有时一列会获得九个项目中的多个项目。

aoyhnmkz

aoyhnmkz2#

使用data.table

library(data.table)

dt <- fread("Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
2       1       0       2       0       0       1       1       2
1       1       1       1       0       0       2       1       2
2       1       0       2       1       1       1       1       0
0       0       1       1       3       0       2       2       0
1       0       0       1       1       0       2       2       2")

set.seed(123)
dt_list <- vector("list", 10)
for (i in 1:10) {
  dt_tmp <- dt[, sample(.SD), by = .(seq_len(nrow(dt)))][, -1]
  setnames(dt_tmp, names(dt))
  dt_list[[i]] <- dt_tmp
}

dt_list

[[1]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       0       2       1       1       0       1       2       2
2:       0       1       1       1       0       2       1       1       2
3:       0       2       1       0       1       1       1       1       2
4:       1       1       0       0       0       3       2       0       2
5:       2       1       2       0       0       1       2       1       0

[[2]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       2       1       0       2       2       0       1       1
2:       1       2       1       1       0       2       0       1       1
3:       2       1       1       0       2       1       0       1       1
4:       1       3       2       0       1       0       0       0       2
5:       2       2       0       0       1       2       1       0       1

[[3]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       0       0       2       1       1       2       2       1
2:       1       1       1       2       1       0       0       2       1
3:       2       1       2       1       0       0       1       1       1
4:       2       0       2       1       3       0       1       0       0
5:       2       0       0       1       0       2       1       2       1

[[4]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       2       1       0       1       1       2       0       2
2:       1       1       1       2       1       0       1       2       0
3:       0       1       1       0       2       1       1       2       1
4:       1       0       0       0       0       1       2       2       3
5:       2       0       1       2       0       0       1       2       1

[[5]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       2       1       2       1       1       2       0       0       0
2:       2       0       1       1       2       1       0       1       1
3:       0       1       1       1       1       2       1       0       2
4:       0       2       0       1       0       3       1       0       2
5:       1       0       2       0       2       1       0       1       2

[[6]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       1       1       2       0       2       0       2       0       1
2:       0       1       1       1       2       2       0       1       1
3:       1       1       2       0       1       2       1       0       1
4:       0       2       3       0       1       1       0       0       2
5:       0       1       2       1       1       0       2       2       0

[[7]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       2       0       1       1       0       0       1       2       2
2:       2       1       1       0       0       1       1       2       1
3:       1       1       1       2       1       2       1       0       0
4:       0       0       3       0       1       2       1       0       2
5:       2       1       0       2       2       0       1       0       1

[[8]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       2       2       1       0       2       0       1       1
2:       1       2       1       1       1       0       0       2       1
3:       0       2       1       1       1       1       2       1       0
4:       2       3       2       1       0       0       0       0       1
5:       0       0       1       0       2       1       2       2       1

[[9]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       2       0       1       1       1       2       0       0       2
2:       1       0       1       1       2       1       1       2       0
3:       1       0       2       2       1       1       0       1       1
4:       1       0       2       0       3       1       2       0       0
5:       1       1       1       2       0       2       0       2       0

[[10]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       1       1       0       2       0       1       2       2
2:       1       1       1       1       0       2       1       2       0
3:       1       1       2       2       0       0       1       1       1
4:       2       0       3       2       0       0       1       1       0
5:       2       0       1       0       2       2       1       0       1

# To validate they match the condition

lapply(dt_list, rowSums)

[[1]]
[1] 9 9 9 9 9

[[2]]
[1] 9 9 9 9 9

[[3]]
[1] 9 9 9 9 9

[[4]]
[1] 9 9 9 9 9

[[5]]
[1] 9 9 9 9 9

[[6]]
[1] 9 9 9 9 9

[[7]]
[1] 9 9 9 9 9

[[8]]
[1] 9 9 9 9 9

[[9]]
[1] 9 9 9 9 9

[[10]]
[1] 9 9 9 9 9

# To validate they are differents

lapply(dt_list, colSums)

[[1]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      3       5       6       2       2       7       7       5       8 

[[2]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      6      10       5       1       6       7       1       3       6 

[[3]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      7       2       5       7       5       3       5       7       4 

[[4]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      4       4       4       4       4       3       7       8       7 

[[5]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      5       4       6       4       6       9       2       2       7 

[[6]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      2       6      10       2       7       5       5       3       5 

[[7]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      7       3       6       5       4       5       5       4       6 

[[8]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      3       9       7       4       4       4       4       6       4 

[[9]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      6       1       7       6       7       7       3       5       3 

[[10]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      6       3       8       5       4       4       5       6       4
fcipmucu

fcipmucu3#

我想你可以像下面这样尝试rmultinom

> set.seed(0)

> (d <- as.data.frame(t(rmultinom(5, 9, rep(1, 9)))))
  V1 V2 V3 V4 V5 V6 V7 V8 V9
1  2  0  1  1  2  0  2  1  0
2  1  1  0  0  0  2  1  3  1
3  1  1  4  0  1  1  0  1  0
4  0  0  1  0  1  3  1  1  2
5  1  1  0  2  1  2  0  1  1

# verify the resulting dataframe
> rowSums(d)
[1] 9 9 9 9 9

如果你想把代码 Package 成一个函数以便于使用,你可以尝试

f <- function(nrFcts, nrRows = 5) {
    setNames(
        as.data.frame(t(rmultinom(nrRows, nrFcts, rep(1, nrFcts)))),
        paste0("Factor", seq_len(nrFcts))
    )
}

使得

> f(9)
  Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1       1       2       1       1       1       1       1       0       1
2       1       1       1       1       2       1       0       0       2
3       0       1       1       1       1       3       0       1       1
4       0       1       0       1       2       0       3       1       1
5       2       0       0       1       2       2       0       2       0

> f(4, 10)
   Factor1 Factor2 Factor3 Factor4
1        1       2       1       0
2        2       1       1       0
3        2       0       1       1
4        1       1       1       1
5        2       0       0       2
6        0       0       2       2
7        1       1       1       1
8        2       0       1       1
9        1       1       1       1
10       1       2       0       1

相关问题