使用时间条件而不是季度/月(dplyr)来改变情绪指标

ldioqlga  于 2023-03-10  发布在  其他
关注(0)|答案(2)|浏览(84)

我有一个reddit数据集,其中每一行代表一个reddit帖子,我有一个给定用户名的每个reddit帖子的情绪得分,我还有一个变量,捕捉同一个用户名写的所有帖子的平均情绪。
我正在尝试创建一个情绪指标相关的时间轴最低工资政策,在那里我想分类情绪每个用户名的基础上三个时期:
1-在政策公布之前,我们假设是在“2021-03-01”2-在政策公布之后但在实施之前,因此在“2021-03-01”之后但在“2021 - 09 -01”之前3-在政策实施之后,在“2021- 09 -01”
我已经能够按月或季度计算每个用户名的情绪,如下所示,但我想根据上面的具体政策时间轴创建每个用户名的情绪,我不知道如何做到这一点。

上传包

library(tidyverse)
library(lubridate)
library(zoo)

打印特定列的数据示例

dput(df[1:5,c(3,4,21, 22, 23)])

输出:

structure(list(date = structure(c(15149, 15150, 15150, 15150, 
15150), class = "Date"), username = c("ax", "aa", 
"cartman", "abc", "aff"
), quarter_yr = c("2011 Q2", "2011 Q2", "2011 Q2", "2011 Q2", 
"2011 Q2"), sentiment_score = c("0", "-1", "1", "-1", "-1"), 
    avg_sentiment = c(0.0666666666666667, -0.777777777777778, 
    1, -1, -1)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), groups = structure(list(username = c("ax", 
"cartman", "abc", "aff"), .rows = structure(list(5L, 4L, 1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), .drop = TRUE))

创建季度/年度变量

sentiment_df <- sentiment_df %>% 
  mutate(date = ymd(date),
         quarter_yr = paste(year(date), quarters(date)))

根据每个用户名的许多评论/帖子,计算其平均情绪得分:

sentiment_df <-
df %>% group_by(username, quarter_yr) %>% summarise(avg_sentiment = mean(as.numeric(sentiment_score)))

用户名的季度情绪:

dput(sentiment_df[1:2,c(1,8)])

输出

structure(list(username = c("cartman","aa"
), `2014 Q2` = c(NA_real_, NA_real_)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L), groups = structure(list(
    username = c("cartman","aa"), .rows = structure(list(
        1L, 2L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))
xa9qqrwz

xa9qqrwz1#

sentiment_df <- sentiment_df %>% 
  mutate(date = ymd(date),
         quarter_yr = paste(year(date), quarters(date)),
         phase = case_when(date < ymd(20210301) ~ "1 Before announcement",
                           date < ymd(20210901) ~ "2 Before implementation",
                           TRUE ~ "3 After implementation"))

sentiment_df <-
df %>% 
  group_by(username, phase) %>% 
  summarise(avg_sentiment = mean(as.numeric(sentiment_score)))
klsxnrf1

klsxnrf12#

看起来您只是使用mutate()case_when()创建了一个新变量,然后按新变量分组。下面是我的尝试。这是您想要的吗?

library(dplyr)
library(lubridate)
library(zoo)
sentiment_df<-structure(list(date = structure(c(15149, 15150, 15150, 15150, 
                                  15150), class = "Date"), username = c("ax", "aa", 
                                                                        "cartman", "abc", "aff"
                                  ), quarter_yr = c("2011 Q2", "2011 Q2", "2011 Q2", "2011 Q2", 
                                                    "2011 Q2"), sentiment_score = c("0", "-1", "1", "-1", "-1"), 
               avg_sentiment = c(0.0666666666666667, -0.777777777777778, 
                                 1, -1, -1)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
                                 ), row.names = c(NA, -5L), groups = structure(list(username = c("ax", 
                                                                                                 "cartman", "abc", "aff"), .rows = structure(list(5L, 4L, 1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of", 
                                                                                                                                                                                                     "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
                                                                                                                                                                                                     ), row.names = c(NA, -5L), .drop = TRUE))
sentiment_df <- sentiment_df %>%  mutate(date = ymd(date),
         quarter_yr = paste(year(date), quarters(date)),
         implementation_period = case_when(date < as.Date("2021-03-01") ~ "Before",
                            date >= as.Date("2021-03-01") & date < as.Date("2021-09-01") ~ "Pre_Implementation",
                            TRUE ~ "After"))

sentiment_df <-
  sentiment_df %>% group_by(username, implementation_period) %>% summarise(avg_sentiment = mean(as.numeric(sentiment_score)))

一个简短的说明,在你提供的数据中只有“之前”的日期。但我认为它应该对整个数据集都有效。

相关问题