R图|按开始日期和结束日期每月盘点

bq9c1y66  于 2023-07-31  发布在  其他
关注(0)|答案(2)|浏览(115)

在开发代码时遇到问题,该代码将制作一个图表,以每月计算公司在原始雇用日期和终止日期之前的员工人数:

structure(list(original_hire_date = c("7/8/2019", "7/15/2019", 
"7/29/2019", "8/5/2019", "8/12/2019", "8/19/2019", "8/26/2019", 
"8/26/2019", "8/26/2019", "9/3/2019", "9/9/2019", "9/9/2019", 
"10/8/2019", "9/30/2019", "9/30/2019", "9/30/2019", "9/30/2019", 
"9/30/2019", "9/30/2019", "9/30/2019", "9/30/2019", "9/30/2019", 
"9/30/2019", "10/14/2019", "10/28/2019"), termination_date = c(NA, 
NA, NA, "8/21/2020", NA, "6/30/2020", NA, "7/25/2020", NA, NA, 
NA, NA, "8/21/2020", "6/30/2020", NA, "6/30/2020", NA, "6/30/2020", 
"6/5/2020", "6/30/2020", "6/30/2020", NA, "3/2/2020", "8/27/2021", 
NA)), row.names = c(NA, -25L), class = c("tbl_df", "tbl", "data.frame"
))

字符串
目标是按年和月创建图表(例如。2019年7月)及该期间的雇员人数。由于一些员工没有离开,这实际上是从他们开始或离开公司时开始,随着时间的推移减去和增加员工。
图表示例:

的数据

q9rjltbz

q9rjltbz1#

在这里,我重塑了long,将雇用计数为+1,将终止计数为-1,按日期汇总,然后将计数计算为雇用和终止的累积总和。(一个可能的变化可能是将终止日期调整为晚一天-例如,如果有人工作了一天,我们可能会认为他们早上是+1,下午是-1,所以他们当天贡献了+1,而不是零。我在这里没有做这个调整)。

library(tidyverse)
df1 |>
  pivot_longer(1:2) %>%
  mutate(change = if_else(name == "original_hire_date", 1, -1),
         date = lubridate::mdy(value)) %>%
  filter(!is.na(date)) |>
  arrange(date) |>
  count(date, wt = change, name = "change") |>
  complete(date = seq.Date(min(date), max(date), by = 1),  # to fill in all days,
           fill = list(change = 0)) |> # so count doesn't drift between observations
  mutate(count = cumsum(change)) |>
  ggplot(aes(date, count)) +
  geom_line()

字符串
x1c 0d1x的数据

7d7tgy0s

7d7tgy0s2#

这里有一个方法:

library(tidyverse)

grouped_df <- df %>% 
  # put the hirings and firing dates into one column
  pivot_longer(cols = everything(), names_to = "status", values_to = "date", values_drop_na = TRUE) %>%

  # parse the dates as date format
  mutate(date = ymd(format(mdy(date), "%Y/%m/01"))) %>%
  count(status, date) %>%

  # pivot it wider again, so each year-month combination has it's own column
  pivot_wider(names_from = status, values_from = n, values_fill = 0) %>%

  # get the change and cumulative sum of workers
  mutate(`Net Change` = original_hire_date - termination_date, 
         `Total Workers` = cumsum(`Net Change`))

grouped_df %>%
  ggplot(aes(x = date, y = `Total Workers`)) +
  geom_line() + 
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(title = "Total Workers")

字符串


的数据

相关问题