R:在R里玩“猜猜谁”

ff29svar  于 2022-12-25  发布在  其他
关注(0)|答案(1)|浏览(138)

我正在使用R编程语言。
我正在尝试创建一个类似于"猜猜谁游戏"(https://en.wikipedia.org/wiki/Guess_Who%3F)的游戏--在这个游戏中,玩家试图根据一系列的猜测来缩小游戏中角色的范围。
下面是我模拟的一个数据集,其中包含了具有不同特征的运动员的"计数":

hair_color = factor(c("black", "brown", "blonde", "bald"))
glasses = factor(c("yes", "no", "contact lenses"))
sport = factor(c("football", "basketball", "tennis"))
gender = factor(c("male", "female", "other"))

problem = expand.grid(var1 = hair_color, var2 = glasses, var3 = sport, var4 = gender)
problem$counts = as.integer(rnorm(108, 20,5))
dataset = problem

    var1 var2     var3 var4 counts
1  black  yes football male     22
2  brown  yes football male     16
3 blonde  yes football male     12
4   bald  yes football male     22
5  black   no football male     14
6  brown   no football male     19

然后,我编写了一个函数,让用户从这个数据集中选择对应于特定特征配置文件的行:

my_function <- function(dataset, var1 = NULL, var2 = NULL, var3 = NULL, var4 = NULL) {
    
    # Create a logical vector to store the rows that match the specified criteria
    selection <- rep(TRUE, nrow(dataset))
    
    # Filter rows based on the specified levels of var1
    if (!is.null(var1)) {
        selection <- selection & dataset$var1 %in% var1
    }
    
    # Filter rows based on the specified levels of var2
    if (!is.null(var2)) {
        selection <- selection & dataset$var2 %in% var2
    }
    
    # Filter rows based on the specified levels of var3
    if (!is.null(var3)) {
        selection <- selection & dataset$var3 %in% var3
    }
    
    # Filter rows based on the specified levels of var4
    if (!is.null(var4)) {
        selection <- selection & dataset$var4 %in% var4
    }
    
    # Select the rows that match the specified criteria
    selected_rows <- dataset[selection, ]
    
    # Return the selected rows
    return(selected_rows)
}

现在,调用函数-选择所有行,其中:头发为"黑色或棕色",眼镜为"是":

head(my_function(dataset, var1 = c("black", "brown"), var2 = c("yes")))

    var1 var2       var3   var4 counts
1  black  yes   football   male     22
2  brown  yes   football   male     16
13 black  yes basketball   male     14
14 brown  yes basketball   male      9
25 black  yes     tennis   male     13

另一个例子,调用函数-选择所有行,其中:头发是"黑色或棕色",眼镜是NO:

head(my_function(dataset, var1 = c("black", "brown"), var2 = c("no")))

     var1 var2       var3   var4 counts
5   black   no   football   male     14
6   brown   no   football   male     19
17  black   no basketball   male     17
18  brown   no basketball   male     27
    • 这就引出了我的问题--假设我想知道以下几点:假设一个运动员有黑色或棕色头发,他戴眼镜的(条件)概率是多少?

手动地,我可以这样回答这个问题:

a = my_function(dataset, var1 = c("black", "brown"), var2 = c("yes"))
b = my_function(dataset, var1 = c("black", "brown"), var2 = c("no"))
prob_yes = sum(a$counts) / (sum(a$counts) + sum(b$counts))
prob_no = sum(b$counts) / (sum(a$counts) + sum(b$counts))

> prob_yes
[1] 0.481203

> prob_no
[1] 0.518797
    • 我想知道我是否能以某种方式将这个函数扩展到一般意义上-假设我希望我的函数接受如下输入:**
  • 哪些变量以及这些变量的哪些水平(例如-无需选择所有变量)
  • 应根据哪个变量(单个变量)计算条件概率(例如"眼镜")

并且作为输出:

  • 应计算此变量的所有变量的所有概率

例如,所需的函数可以这样调用:

my_function(dataset, input_var_list = c(var1 = c("black", "brown"), var3 = c("football")),  conditional_var = c("var2"))

这个函数将返回:

  • 假设运动员有黑色/棕色头发并且踢足球,佩戴眼镜的概率
  • 不戴眼镜的概率,假设运动员有黑色/棕色头发并且踢足球

有人能帮我重写这个函数吗?
谢谢!

jvidinwx

jvidinwx1#

不是最有效的方法,但我坚持了你走的路...这是你在寻找的吗?

my_function <- function(dataset, var1 = NULL, var2 = NULL, var3 = NULL, var4 = NULL,
                        conditional_var = NULL) {
  
  # Create a logical vector to store the rows that match the specified criteria
  selection <- rep(TRUE, nrow(dataset))
  
  # Filter rows based on the specified levels of var1
  if (!is.null(var1)) {
    selection <- selection & dataset$var1 %in% var1
  }
  
  # Filter rows based on the specified levels of var2
  if (!is.null(var2)) {
    selection <- selection & dataset$var2 %in% var2
  }
  
  # Filter rows based on the specified levels of var3
  if (!is.null(var3)) {
    selection <- selection & dataset$var3 %in% var3
  }
  
  # Filter rows based on the specified levels of var4
  if (!is.null(var4)) {
    selection <- selection & dataset$var4 %in% var4
  }
  
  # Select the rows that match the specified criteria
  selected_rows <- dataset[selection, ]
  
  # Return the selected rows
  if (is.null(conditional_var)) {
    return(selected_rows) 
  } else {
    return(prop.table(table(selected_rows[conditional_var])))
  }
}

my_function(dataset, var1 = c("black", "brown"), var3 = c("football"), conditional_var = c("var2"))
    • 输出**
var2
contact lenses             no            yes 
     0.3333333      0.3333333      0.3333333

相关问题