单个tapply或aggregate语句中的多个函数

yqkkidmi  于 2023-04-27  发布在  其他
关注(0)|答案(5)|浏览(132)

是否可以在一个tapply或aggregate语句中包含两个函数?
下面我使用两个tapply语句和两个aggregate语句:一个用于平均值,一个用于SD。
我希望把发言合并起来。

my.Data = read.table(text = "
  animal    age     sex  weight
       1  adult  female     100
       2  young    male      75
       3  adult    male      90
       4  adult  female      95
       5  young  female      80
", sep = "", header = TRUE)

with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x)}))
with(my.Data, tapply(weight, list(age, sex), function(x) {sd(x)  }))

with(my.Data, aggregate(weight ~ age + sex, FUN = mean)
with(my.Data, aggregate(weight ~ age + sex, FUN =   sd)

# this does not work:

with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x) ; sd(x)}))

# I would also prefer that the output be formatted something similar to that 
# show below.  `aggregate` formats the output perfectly.  I just cannot figure 
# out how to implement two functions in one statement.

  age    sex   mean        sd
adult female   97.5  3.535534
adult   male     90        NA
young female   80.0        NA
young   male     75        NA

我总是可以运行两个单独的语句并合并输出。我只是希望可能有一个稍微方便一点的解决方案。
我在这里找到了下面的答案:Apply multiple functions to column using tapply

f <- function(x) c(mean(x), sd(x))
do.call( rbind, with(my.Data, tapply(weight, list(age, sex), f)) )

然而,行或列都未被标记。

[,1]     [,2]
[1,] 97.5 3.535534
[2,] 80.0       NA
[3,] 90.0       NA
[4,] 75.0       NA

我更喜欢一个基于R的解决方案。plyr包的解决方案发布在上面的链接中。如果我能在上面的输出中添加正确的行和列标题,那就完美了。

vsnjm48y

vsnjm48y1#

但这些应该有:

with(my.Data, aggregate(weight, list(age, sex), function(x) { c(MEAN=mean(x), SD=sd(x) )}))

with(my.Data, tapply(weight, list(age, sex), function(x) { c(mean(x) , sd(x) )} ))
# Not a nice structure but the results are in there

with(my.Data, aggregate(weight ~ age + sex, FUN =  function(x) c( SD = sd(x), MN= mean(x) ) ) )
    age    sex weight.SD weight.MN
1 adult female  3.535534 97.500000
2 young female        NA 80.000000
3 adult   male        NA 90.000000
4 young   male        NA 75.

要遵守的原则是让你的函数返回“一件事”,它可以是一个向量或一个列表,但不能是两个函数调用的连续调用。

dbf7pr2w

dbf7pr2w2#

如果你想使用data.table,它内置了withby

library(data.table)
myDT <- data.table(my.Data, key="animal")

myDT[, c("mean", "sd") := list(mean(weight), sd(weight)), by=list(age, sex)]

myDT[, list(mean_Aggr=sum(mean(weight)), sd_Aggr=sum(sd(weight))), by=list(age, sex)]
     age    sex mean_Aggr   sd_Aggr
1: adult female     96.0  3.6055513
2: young   male     76.5  2.1213203
3: adult   male     91.0  1.4142136
4: young female     84.5  0.7071068
  • 我使用了稍微不同的数据集,以便没有sd的NA值 *
wlzqhblo

wlzqhblo3#

本着分享的精神,* 如果您熟悉SQL*,您也可以考虑“sqldf”包。(强调是因为您确实需要知道,例如,meanavg,以便获得您想要的结果。)

sqldf("select age, sex, 
      avg(weight) `Wt.Mean`, 
      stdev(weight) `Wt.SD` 
      from `my.Data` 
      group by age, sex")
    age    sex Wt.Mean    Wt.SD
1 adult female    97.5 3.535534
2 adult   male    90.0 0.000000
3 young female    80.0 0.000000
4 young   male    75.0 0.000000
1yjd4xko

1yjd4xko4#

Reshape让你传递2个函数;而reshape 2则没有。

library(reshape)
my.Data = read.table(text = "
  animal    age     sex  weight
       1  adult  female     100
       2  young    male      75
       3  adult    male      90
       4  adult  female      95
       5  young  female      80
", sep = "", header = TRUE)
my.Data[,1]<- NULL
(a1<-  melt(my.Data, id=c("age", "sex"), measured=c("weight")))
(cast(a1, age + sex ~ variable, c(mean, sd), fill=NA))

#     age    sex weight_mean weight_sd
# 1 adult female        97.5  3.535534
# 2 adult   male        90.0        NA
# 3 young female        80.0        NA
# 4 young   male        75.0        NA

我欠@Ramnath的,他昨天才注意到这一点。

q0qdq0h2

q0qdq0h25#

SSBtools包中的函数aggregate_multiple_funaggregate的 Package 器,它允许多个函数和多个变量的函数。在这种情况下,有两种可能性:

library(SSBtools)
my.Data = read.table(text = "
  animal    age     sex  weight
       1  adult  female     100
       2  young    male      75
       3  adult    male      90
       4  adult  female      95
       5  young  female      80
", sep = "", header = TRUE)

aggregate_multiple_fun(my.Data, my.Data[c("age", "sex")], 
                       vars = c(mean = "weight", sd = "weight"))

#     age    sex weight_mean weight_sd
# 1 adult female        97.5  3.535534
# 2 young female        80.0        NA
# 3 adult   male        90.0        NA
# 4 young   male        75.0        NA

aggregate_multiple_fun(my.Data, my.Data[c("age", "sex")], 
                       vars = "weight", fun = c("mean", "sd"))

#     age    sex mean       sd
# 1 adult female 97.5 3.535534
# 2 young female 80.0       NA
# 3 adult   male 90.0       NA
# 4 young   male 75.0       NA

相关问题