我正在使用R编程语言。
我正在尝试创建一个类似于"猜猜谁游戏"(https://en.wikipedia.org/wiki/Guess_Who%3F)的游戏--在这个游戏中,玩家试图根据一系列的猜测来缩小游戏中角色的范围。
下面是我模拟的一个数据集,其中包含了具有不同特征的运动员的"计数":
hair_color = factor(c("black", "brown", "blonde", "bald"))
glasses = factor(c("yes", "no", "contact lenses"))
sport = factor(c("football", "basketball", "tennis"))
gender = factor(c("male", "female", "other"))
problem = expand.grid(var1 = hair_color, var2 = glasses, var3 = sport, var4 = gender)
problem$counts = as.integer(rnorm(108, 20,5))
dataset = problem
var1 var2 var3 var4 counts
1 black yes football male 22
2 brown yes football male 16
3 blonde yes football male 12
4 bald yes football male 22
5 black no football male 14
6 brown no football male 19
然后,我编写了一个函数,让用户从这个数据集中选择对应于特定特征配置文件的行:
my_function <- function(dataset, var1 = NULL, var2 = NULL, var3 = NULL, var4 = NULL) {
# Create a logical vector to store the rows that match the specified criteria
selection <- rep(TRUE, nrow(dataset))
# Filter rows based on the specified levels of var1
if (!is.null(var1)) {
selection <- selection & dataset$var1 %in% var1
}
# Filter rows based on the specified levels of var2
if (!is.null(var2)) {
selection <- selection & dataset$var2 %in% var2
}
# Filter rows based on the specified levels of var3
if (!is.null(var3)) {
selection <- selection & dataset$var3 %in% var3
}
# Filter rows based on the specified levels of var4
if (!is.null(var4)) {
selection <- selection & dataset$var4 %in% var4
}
# Select the rows that match the specified criteria
selected_rows <- dataset[selection, ]
# Return the selected rows
return(selected_rows)
}
现在,调用函数-选择所有行,其中:头发为"黑色或棕色",眼镜为"是":
head(my_function(dataset, var1 = c("black", "brown"), var2 = c("yes")))
var1 var2 var3 var4 counts
1 black yes football male 22
2 brown yes football male 16
13 black yes basketball male 14
14 brown yes basketball male 9
25 black yes tennis male 13
另一个例子,调用函数-选择所有行,其中:头发是"黑色或棕色",眼镜是NO:
head(my_function(dataset, var1 = c("black", "brown"), var2 = c("no")))
var1 var2 var3 var4 counts
5 black no football male 14
6 brown no football male 19
17 black no basketball male 17
18 brown no basketball male 27
- 这就引出了我的问题--假设我想知道以下几点:假设一个运动员有黑色或棕色头发,他戴眼镜的(条件)概率是多少?
手动地,我可以这样回答这个问题:
a = my_function(dataset, var1 = c("black", "brown"), var2 = c("yes"))
b = my_function(dataset, var1 = c("black", "brown"), var2 = c("no"))
prob_yes = sum(a$counts) / (sum(a$counts) + sum(b$counts))
prob_no = sum(b$counts) / (sum(a$counts) + sum(b$counts))
> prob_yes
[1] 0.481203
> prob_no
[1] 0.518797
- 我想知道我是否能以某种方式将这个函数扩展到一般意义上-假设我希望我的函数接受如下输入:**
- 哪些变量以及这些变量的哪些水平(例如-无需选择所有变量)
- 应根据哪个变量(单个变量)计算条件概率(例如"眼镜")
并且作为输出:
- 应计算此变量的所有变量的所有概率
例如,所需的函数可以这样调用:
my_function(dataset, input_var_list = c(var1 = c("black", "brown"), var3 = c("football")), conditional_var = c("var2"))
这个函数将返回:
- 假设运动员有黑色/棕色头发并且踢足球,佩戴眼镜的概率
- 不戴眼镜的概率,假设运动员有黑色/棕色头发并且踢足球
有人能帮我重写这个函数吗?
谢谢!
1条答案
按热度按时间jvidinwx1#
不是最有效的方法,但我坚持了你走的路...这是你在寻找的吗?