R语言 基于特定面元大小的面元分类变量

vmdwslir  于 2022-12-20  发布在  其他
关注(0)|答案(1)|浏览(147)

我有一个数据框,其中每行的名称,我想划分成特定大小的箱,每个名称可以是多个不同的箱的一部分。但在最后的装箱每个名称只能使用一次。

name <- c("James", "Terry", "Fred", "Scottie", "Clint", "Gary", "Kevin", "Harrison", "Patrick")
available_bins <- c("A/B", "A/B", "B", "C/D", "E", "A/D", "A/D", "D/C", "D/C")
init <- data.frame(name,available_bins)
init
#      name available_bins
#1    James            A/B
#2    Terry            A/B
#3     Fred              B
#4  Scottie            C/D
#5    Clint              E
#6     Gary            A/D
#7    Kevin            A/D
#8 Harrison            D/C
#9  Patrick            D/C

每个bin具有特定大小,并存储在另一个 Dataframe 中。

bin_name <- c("A","B","C","D","E")
bin_size <- c(2,2,2,2,1)
binning_parameters <- data.frame(bin_name,bin_size)
binning_parameters
#  bin_name bin_size
#1        A        2
#2        B        2
#3        C        2
#4        D        2
#5        E        1

每个人都应该被放进一个箱子里,而且总是会有足够的大小合适的箱子放进去。例如,在这里,您有9个人,4个大小为2的箱子,1个大小为1的箱子。有没有一种方法可以有效地做到这一点(同时考虑箱子大小)?没有必要只有一个正确答案,只是可以将所有姓名放入适当的箱子和大小。
示例结果:

final_bin <- c("A", "B", "B", "C", "E", "A", "D", "C", "D")
final_bin <- data.frame(name,final_bin)
final_bin
#      name final_bin
#1    James         A
#2    Terry         B
#3     Fred         B
#4  Scottie         C
#5    Clint         E
#6     Gary         A
#7    Kevin         D
#8 Harrison         C
#9  Patrick         D

我试过按一个箱子里最少的人排序,然后从集合中删除剩下的人,但可用箱子之间的重叠意味着有时我在每次迭代之间删除了错误的人。

p1tboqfb

p1tboqfb1#

我没有找到"表格"方法(操作数据表,例如整洁样式),但这里有一个依赖于列表操作和{purrr}助手的递归解决方案:

  • 加载程序包:
library(dplyr)
library(purrr)
  • 创建垃圾箱列表,每个物品都有容量(容纳人员)和成员(收集人员):

x一个一个一个一个x一个一个二个x

  • 创建人员列表,每个人员列表具有可用的箱和属性has_bin,属性has_bin被设置为TRUE,一旦该人员被给予箱:

一个三个三个一个

  • 递归函数collect_persons,其(通过匹配箱字母)抓取每个箱的人,直到所有箱被填满或所有人已经被放置:
collect_persons <- function(bins, persons){
    ## exclude bins already filled:
    free_bins <- bins |> discard(~ {.x$capacity < 1})
    ## exclude persons already placed:
    free_persons <- persons |> discard(~ .x$has_bin)
    ## EXIT for lack of free bins or persons:
    if(!(length(free_persons) & length(free_bins))) return(bins)
    ## what's the label (letter) of the current free bin?
    this_bin_letter <- names(free_bins)[1]
    this_free_bin <- free_bins[[this_bin_letter]]
    ## how many persons can this free bin accomodate?
    bin_capacity = this_free_bin$capacity
    ## find candidate persons for a free_bin:
    person_index = grep(this_bin_letter, map_chr(free_persons, 'available_bins'))
    ## limit matches to current bin capacity:
    person_index = na.omit(person_index[1:bin_capacity])
    ## index of the first n persons to fill the bin's capacity n:
    candidate_names = names(free_persons)[person_index]
    ## add the indexed persons as bin members: 
    bins[[this_bin_letter]]$members = candidate_names
    ## mark these persons as already placed:
    persons[candidate_names] <- persons[candidate_names] |> map(~ modify_in(.x, 'has_bin', ~ TRUE))
    ## print(persons[candidate_names])
    ## mark the current bin as full:
    bins[[this_bin_letter]]$capacity = 0
    ## repeat until either all bins are full or all persons are placed:
    collect_persons(bins, persons)
}
  • 调用该函数并将结果整形为 Dataframe :
data.frame(persons = collect_persons(bins, persons) |>
               imap('members', ~ .x) |> unlist()
           ) |>
    tibble::rownames_to_column('bin') |> ## package tibble required
    mutate(bin = substr(bin, 1, 1))

输出:

bin  persons
1   A    James
2   A    Terry
3   B     Fred
4   C  Scottie
5   C Harrison
6   C  Patrick
7   D     Gary
8   D    Kevin
9   E    Clint

相关问题