我有一个数据集,看起来像这样:
working_df %>%
select(user_type, ride_length, day_of_week) %>%
as.tibble() %>%
print(n = 3)
# A tibble: 4,324,766 × 3
user_type ride_length day_of_week
<chr> <dbl> <ord>
1 Casual 34355 Saturday
2 Member 32035 Monday
3 Casual 29271 Saturday
我想创建一个表,提供每组用户每天的平均骑行长度。我使用SQL的CASE WHEN将一周中的天数划分为单独的列。请参见下面:
SELECT
COALESCE(user_type,'combined') AS user_type,
AVG(CASE WHEN DAYNAME(started_at) = 'Monday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_monday,
AVG(CASE WHEN DAYNAME(started_at) = 'Tuesday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_tuesday,
AVG(CASE WHEN DAYNAME(started_at) = 'Wednesday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_wednesday,
AVG(CASE WHEN DAYNAME(started_at) = 'Thursday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_thursday,
AVG(CASE WHEN DAYNAME(started_at) = 'Friday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_friday,
AVG(CASE WHEN DAYNAME(started_at) = 'Saturday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_saturday,
AVG(CASE WHEN DAYNAME(started_at) = 'Sunday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_sunday,
AVG(TIMESTAMPDIFF(MINUTE,started_at, ended_at)) AS grand_total
FROM
bikes.work
GROUP BY
user_type WITH ROLLUP;
在我的研究中,我发现了dplyr::case_when(),但我只看到过在一列中计算值的例子,我希望一周中的每一天都能展开。R是否有能力将结果推送到几列中?
编辑:感谢您的建议,我找到了这个潜在的答案:
working_df %>%
mutate_at(vars(c(user_type, day_of_week)), funs(as.character(.))) %>%
bind_rows(mutate(., user_type = "Combined")) %>%
bind_rows(mutate(., day_of_week = "Grand_Total")) %>%
group_by(user_type, day_of_week) %>%
summarize(avg_ride_length_min = mean(ride_length)) %>%
pivot_wider(names_from = day_of_week,
values_from = avg_ride_length_min,
names_prefix = "avg_ride_length_")
user_type avg_ride_length_Friday avg_ride_length_Grand_Total avg_ride_length_Monday avg_ride_length_Saturday avg_ride_length_Sunday
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Casual 22.6 24.2 25.0 27.0 27.5
2 Combined 16.5 17.3 16.8 20.9 20.8
3 Member 12.4 12.6 12.2 14.2 14.0
avg_ride_length_Thursday avg_ride_length_Tuesday avg_ride_length_Wednesday
<dbl> <dbl> <dbl>
1 21.6 21.6 20.9
2 15.5 15.1 14.9
3 12.2 11.9 12.0
2条答案
按热度按时间oogrdqng1#
下面是更简洁的答案:
由于某种原因,
bind_rows
按字母顺序更改了行顺序。我无法更改,所以我将最后一行从“Combined”更改为“United”ezykj2lf2#
我们可以尝试这样的东西:
在这里,我们首先计算长格式的avg_ride_length,然后使用pivot_wider()函数将其转换为宽格式:
数据: