我正在尝试计算没有缺失值的加权百分比。关于如何修改下面的代码,使NA值不属于分析的一部分,有什么建议吗?
df <- data.frame(
wave = rep(c(1,2),6),
gender = rep(c("m", "f", NA),4),
exp = rep(c("c", "e", NA),4),
weights = rnorm(12, 1, .5)
)
> head(df)
wave gender exp weights
1 1 m c 0.6556222
2 2 f e 0.6462524
3 1 <NA> <NA> 1.1822910
4 2 m c 1.3842665
5 1 f e 0.9438269
6 2 <NA> <NA> 1.4405539
library(srvyr)
library(tidyverse)
df %>%
gather(key, value, gender, exp) %>%
as_survey_design(1, weight = weights) %>%
dplyr::group_by(wave, key, value) %>%
dplyr::summarise(prop_weighted=srvyr::survey_mean(na.rm=TRUE)*100)
# A tibble: 12 × 5
# Groups: wave, key [4]
wave key value prop_weighted prop_weighted_se
<dbl> <chr> <chr> <dbl> <dbl>
1 1 exp c 27.0 18.1
2 1 exp e 38.7 21.7
3 1 exp NA 34.3 20.3
4 1 gender f 38.7 21.7
5 1 gender m 27.0 18.1
6 1 gender NA 34.3 20.3
7 2 exp c 27.6 19.8
8 2 exp e 20.3 14.8
9 2 exp NA 52.1 23.0
10 2 gender f 20.3 14.8
11 2 gender m 27.6 19.8
12 2 gender NA 52.1 23.0
1条答案
按热度按时间o8x7eapl1#
我认为最简单的方法是在计算之前删除这些行:
df %>% drop_na()
将删除具有NA值的行。如果要指定哪些列,可以使用此df %>% drop_na(gender, exp)
来显式指定