SQL的CASE WHEN x THEN 1 ELSE NULL END在R中的等价物

jjhzyzn0  于 2023-04-27  发布在  其他
关注(0)|答案(2)|浏览(140)

我有一个数据集,看起来像这样:

working_df %>% 
  select(user_type, ride_length, day_of_week) %>%
  as.tibble() %>% 
  print(n = 3)
    # A tibble: 4,324,766 × 3
      user_type ride_length day_of_week
      <chr>           <dbl> <ord>      
    1 Casual          34355 Saturday   
    2 Member          32035 Monday     
    3 Casual          29271 Saturday

我想创建一个表,提供每组用户每天的平均骑行长度。我使用SQL的CASE WHEN将一周中的天数划分为单独的列。请参见下面:

SELECT 
    COALESCE(user_type,'combined') AS user_type,
    AVG(CASE WHEN DAYNAME(started_at) = 'Monday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_monday,
    AVG(CASE WHEN DAYNAME(started_at) = 'Tuesday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_tuesday,
    AVG(CASE WHEN DAYNAME(started_at) = 'Wednesday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_wednesday,
    AVG(CASE WHEN DAYNAME(started_at) = 'Thursday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_thursday,
    AVG(CASE WHEN DAYNAME(started_at) = 'Friday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_friday,
    AVG(CASE WHEN DAYNAME(started_at) = 'Saturday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_saturday,
    AVG(CASE WHEN DAYNAME(started_at) = 'Sunday' THEN TIMESTAMPDIFF(MINUTE,started_at, ended_at) ELSE NULL END) AS avg_ride_length_sunday,
    AVG(TIMESTAMPDIFF(MINUTE,started_at, ended_at)) AS grand_total
FROM 
    bikes.work
GROUP BY 
    user_type WITH ROLLUP;

在我的研究中,我发现了dplyr::case_when(),但我只看到过在一列中计算值的例子,我希望一周中的每一天都能展开。R是否有能力将结果推送到几列中?
编辑:感谢您的建议,我找到了这个潜在的答案:

working_df %>%
  mutate_at(vars(c(user_type, day_of_week)), funs(as.character(.))) %>% 
  bind_rows(mutate(., user_type = "Combined")) %>% 
  bind_rows(mutate(., day_of_week = "Grand_Total")) %>% 
  group_by(user_type, day_of_week) %>% 
  summarize(avg_ride_length_min = mean(ride_length))  %>% 
  pivot_wider(names_from = day_of_week,
              values_from = avg_ride_length_min,
              names_prefix = "avg_ride_length_")

 user_type avg_ride_length_Friday avg_ride_length_Grand_Total avg_ride_length_Monday avg_ride_length_Saturday avg_ride_length_Sunday
  <chr>                      <dbl>                       <dbl>                  <dbl>                    <dbl>                  <dbl>
1 Casual                      22.6                        24.2                   25.0                     27.0                   27.5
2 Combined                    16.5                        17.3                   16.8                     20.9                   20.8
3 Member                      12.4                        12.6                   12.2                     14.2                   14.0
  avg_ride_length_Thursday avg_ride_length_Tuesday avg_ride_length_Wednesday
                     <dbl>                   <dbl>                     <dbl>
1                     21.6                    21.6                      20.9
2                     15.5                    15.1                      14.9
3                     12.2                    11.9                      12.0
oogrdqng

oogrdqng1#

下面是更简洁的答案:

working_df %>%
  mutate(across(c(user_type, day_of_week), as.character)) %>% 
  bind_rows(mutate(., user_type = "United")) %>% 
  bind_rows(mutate(., day_of_week = "gt")) %>% 
  group_by(user_type, day_of_week) %>% 
  summarize(avg_ride_length_min = mean(ride_length))  %>% 
  pivot_wider(names_from = day_of_week,
              values_from = avg_ride_length_min,
              names_prefix = "avg_ride_length_") %>% 
  select("user_type", "avg_ride_length_Monday", "avg_ride_length_Tuesday", "avg_ride_length_Wednesday", "avg_ride_length_Thursday", "avg_ride_length_Friday", "avg_ride_length_Saturday", "avg_ride_length_Sunday", "avg_ride_length_gt")
    
    user_type avg_ride_length_Monday avg_ride_length_Tuesday avg_ride_length_Wednesday avg_ride_length_Thursday avg_ride_length_Friday
  <chr>                      <dbl>                   <dbl>                     <dbl>                    <dbl>                  <dbl>
1 Casual                      25.0                    21.6                      20.9                     21.6                   22.6
2 Member                      12.2                    11.9                      12.0                     12.2                   12.4
3 United                      16.8                    15.1                      14.9                     15.5                   16.5
  avg_ride_length_Saturday avg_ride_length_Sunday avg_ride_length_gt
                     <dbl>                  <dbl>              <dbl>
1                     27.0                   27.5               24.2
2                     14.2                   14.0               12.6
3                     20.9                   20.8               17.3

由于某种原因,bind_rows按字母顺序更改了行顺序。我无法更改,所以我将最后一行从“Combined”更改为“United”

ezykj2lf

ezykj2lf2#

我们可以尝试这样的东西:
在这里,我们首先计算长格式的avg_ride_length,然后使用pivot_wider()函数将其转换为宽格式:

library(dplyr)
library(tidyr)

working_df %>%
  group_by(user_type, day_of_week) %>%
  summarise(avg_ride_length = mean(ride_length)) %>% 
  pivot_wider(names_from = day_of_week, 
              values_from = avg_ride_length,
              names_prefix = "avg_rige_length_")

数据:

working_df <- structure(list(user_type = c("Casual", "Member", "Casual"), ride_length = c(34355L, 
32035L, 29271L), day_of_week = c("Saturday", "Monday", "Saturday"
)), class = "data.frame", row.names = c("1", "2", "3"))

相关问题