获取 Dataframe 中组内两行的平均值R

wnvonmuf  于 2023-01-28  发布在  其他
关注(0)|答案(2)|浏览(118)

我尝试在同一个月组之间插入一个新的平均值行,从而为该行创建一个新项目。例如,这三行:

  • 3个月植入体积90%植入剂量:23.9
  • 3个月植入物100%植入物设计:18.0
  • 3个月植入物体积110%植入物描述:15.0

将在每个条目之间插入一行,并生成一个新条目:

  • 3个月植入体积90%植入剂量:23.9
      • 3个月植入量95%植入剂量:20.95英镑**
  • 3个月植入物100%植入物设计:18.0
      • 3个月植入量105%,植入前:16.5**
  • 3个月植入物体积110%植入物描述:15.0

下面是一些代码,我尝试自己,但不工作,也许它有帮助:
此处的问题,希望整个数据集包含额外的行,而不是过时的值:

df %>% group_by(datestamp) %>% 
  mutate(id = rep(1:n(), length.out = n(), each = 4)) %>%
  mutate(item
  )%>% 
  summarize(item    = unique(datestamp),
            item       = paste0(item[1], '_', item[2]),
            value_mean = mean(value))

找到了这个,但对我也不起作用:

do.call(rbind,
        lapply(seq(1, nrow(df), 3), function(i){
          x <- jse_vol[ i:(i + 1), , drop = FALSE]
          x <- x %>% group_by(item)
          res <- rbind(x['value'], colSums(x['value'])/2)
          rownames(res)[ nrow(res) ] <- paste(rownames(x), collapse = "_")
          res
        }))

最后:

df%>%
  data.frame() %>%
  rownames_to_column() %>%
  mutate_if(is.factor, as.numeric) %>%
  group_by(group = gl(n()/2, 2)) %>%
  group_map(~ bind_rows(.x, tibble(rowname = paste(.x$rowname, collapse = "_"), 
                                   t1 = mean(.x$value),
  )))

Dataframe :

datestamp  entity        short_name                       item                     value source
   <date>     <chr>         <chr>                            <chr>                    <dbl> <chr> 
 1 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_90%MNY_DF     23.9 PSEC  
 2 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_100.0%MNY_DF  18.0 PSEC  
 3 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_110%MNY_DF    16.2 PSEC  
 4 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_90%MNY_DF    21.0 PSEC  
 5 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_100%MNY_DF    18   PSEC  
 6 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_110.0%MNY_DF  15.8 PSEC  
 7 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_90%MNY_DF     21.5 PSEC  
 8 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_100%MNY_DF   18.5 PSEC  
 9 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_110%MNY_DF    15.0 PSEC  
10 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_90.0%MNY_DF   22.0 PSEC  
11 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_100%MNY_DF    18.3 PSEC  
12 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_110%MNY_DF   16.7 PSEC  
13 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_90%MNY_DF     23.9 PSEC  
14 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_100.0%MNY_DF  18.0 PSEC  
15 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_110%MNY_DF    16.2 PSEC  
16 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_90%MNY_DF    21.0 PSEC  
17 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_100%MNY_DF    18   PSEC  
18 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_110.0%MNY_DF  15.8 PSEC  
19 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_90%MNY_DF     21.5 PSEC  
20 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_100%MNY_DF   18.5 PSEC  
21 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_110%MNY_DF    15.0 PSEC  
22 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_90.0%MNY_DF   22.0 PSEC  
23 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_100%MNY_DF    18.3 PSEC  
24 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_110%MNY_DF   16.7 PSEC
nqwrtyyt

nqwrtyyt1#

一种解决方案可能类似于以下内容,

df <- data.frame(Name = paste0("Row", 1:10),
                 Value = runif(10, 10, 100))

df2 <- matrix(nrow = 2*nrow(df) - 1,
              ncol = ncol(df))

for (i in 1:nrow(df2)){
  if (i %% 2 != 0){
    df2[i,1] = df[(i+1)/2,1]
    df2[i,2] = df[(i+1)/2,2]
  }else
  if (i %% 2 == 0){
    df2[i,1] = paste0("Average of ",df[(i+1)/2,1], " and ", df[(i+2)/2,1])
    df2[i,2] = (df[i/2,2] + df[(i+2)/2,2])/2
  }
}
rqmkfv5c

rqmkfv5c2#

您的数据集并不容易执行此计算,因为您有两个重要的变量(month和percent)隐藏在字符串中。
第一步应该是提取这些(我已经使用str_extract()从stringr中提取第一个数字和“%”之前的数字)。
从这里开始,还有更多的选择。我将百分比列透视到行中,然后将平均值计算为新列+透视回长格式:

library(tidyverse)

set.seed(1337) # I generate random numbers - this is for consistency

# Make an analogue of your dataframe
df <- crossing(
  datestamp = as.Date(c("2006-01-03", "2006-01-04")),
  mnth = c("3MTH", "6MTH", "9MTH", "12MTH"),
  impvol = "_IMPVOL_",
  pc = c("90%", "100%", "110%"),
  str = "MNY_DF"
) %>% 
  transmute(datestamp, item = paste0(mnth, impvol, pc, str),
            value = round(rnorm(24, 20, 2), 1))

new_rows <- df %>%
  mutate(month = as.numeric(str_extract(item, "[0-9]+")),
         pc = as.numeric(str_extract(item, "(\\d)+(?=%)"))) %>%
  arrange(datestamp, month, pc) %>%
  select(-item) %>% 
  pivot_wider(
    id_cols = c(datestamp, month),
    names_from = pc,
    values_from = value
  ) %>%
  mutate(`95` = (`90` + `100`) / 2,
         `105` = (`100` + `110`) / 2) %>%
  pivot_longer(
    cols = c(`90`, `95`, `100`, `105`, `110`),
    names_to = "pc",
    values_to = "value"
  )

相关问题