R语言 多功能聚合

esbemjvw  于 2023-04-27  发布在  其他
关注(0)|答案(6)|浏览(172)

从下面的 Dataframe DF1中

Branch Loan_Amount TAT
      A         100 2.0
      A         120 4.0
      A         300 9.0
      B         150 1.5
      B         200 2.0

我可以使用聚合函数来获得以下输出作为 Dataframe df2

Branch Number_of_loans Loan_Amount Total_TAT
      A               3         520      15.0
      B               2         350       3.5

我知道我可以使用nrow来计算number_of_贷款和merge,但我正在寻找一种更好的方法。

ryhaxcpt

ryhaxcpt1#

基本 Package :

df1 <- aggregate(.~ Branch, df, FUN = "sum")
df2 <- setNames(aggregate(Loan_Amount~Branch, df, length)[2], c("Number_of_loans"))
cbind(df1, df2)

输出

Branch Loan_Amount  TAT Number_of_loans
1      A         520 15.0               3
2      B         350  3.5               2

Package sqldf

library(sqldf)
sqldf("SELECT Branch, COUNT(Loan_Amount) Number_of_loans, SUM(Loan_Amount) Loan_Amount, SUM(TAT) TAT 
      FROM df 
      GROUP BY Branch")

输出

Branch Number_of_loans Loan_Amount  TAT
1      A               3         520 15.0
2      B               2         350  3.5

数据

df <- structure(list(Branch = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), Loan_Amount = c(100L, 120L, 300L, 150L, 
200L), TAT = c(2, 4, 9, 1.5, 2)), .Names = c("Branch", "Loan_Amount", 
"TAT"), class = "data.frame", row.names = c(NA, -5L))
cfh9epnr

cfh9epnr2#

使用dplyr,你可以这样做:

library(dplyr)
group_by(d,Branch) %>% 
  summarize(Number_of_loans = n(),
            Loan_Amount = sum(Loan_Amount),
            TAT = sum(TAT))

输出

Source: local data frame [2 x 4]

  Branch Number_of_loans Loan_Amount   TAT
  (fctr)           (int)       (int) (dbl)
1      A               3         520  15.0
2      B               2         350   3.5

资料

d <- read.table(text="Branch Loan_Amount TAT
A         100 2.0
A         120 4.0
A         300 9.0
B         150 1.5
B         200 2.0",head=TRUE)
xe55xuns

xe55xuns3#

使用data.table

library(data.table)
setDT(df)[,list(Number_of_loans=.N, 
                Loan_Amount    =sum(Loan_Amount), 
                Total_TAT      =sum(TAT)), by=Branch]
#    Branch Number_of_loans Loan_Amount Total_TAT
# 1:      A               3         520      15.0
# 2:      B               2         350       3.5
ldfqzlk8

ldfqzlk84#

这是一个老帖子,但在一个常见的操作,应该有一个更容易的解决方案,在我看来。
这里有一个可能更简单的单行替代方案。

> aggregate2(df, x = c('Loan_Amount', 'TAT'), by = 'Branch', 
             FUN = list(total = sum, number = length))

  Branch Loan_Amount.total TAT.total Loan_Amount.number TAT.number
1      A               520      15.0                  3          3
2      B               350       3.5                  2          2

aggregate2()jumbled repo中的一个函数,我刚刚在基函数aggregate的基础上构建了它。它为每个FUN函数调用一次aggregate,前后都做了一些工作。
aggregate不同,它接受多个函数。与dplyr解决方案不同,它将所有这些函数应用于所有x变量,而无需例如每个变量一个Loan_Amount = sum(Loan_Amount),

h6my8fg2

h6my8fg25#

这是一个很笨拙且效率低下的方法,但它很有效,也很有趣(它使用了aggregate()):

d <- read.table(text="Branch Loan_Amount TAT
A         100 2.0
A         120 4.0
A         300 9.0
B         150 1.5
B         200 2.0",head=TRUE)

library(stringr)
df = aggregate(.~Branch, data=d, FUN=function(x) paste0(length(x), '|',sum(x)))
df_ = cbind(str_split_fixed(df$Loan_Amount, '|', 4)[,c(2,4)], str_split_fixed(df$TAT, '|', 4)[,4])
df_ = apply(df_, 2, as.numeric)
colnames(df_) = c('Number_of_loans','Loan_Amount','Total_TAT')
cbind(df[,'Branch',drop=F], df_)

生成所需的 Dataframe :

Branch Number_of_loans Loan_Amount Total_TAT
1      A               3         520      15.0
2      B               2         350       3.5
ctrmrzij

ctrmrzij6#

SSBtools包中的函数aggregate_multiple_funaggregate的 Package 器,它允许多个函数和多个变量的函数。

library(SSBtools)
aggregate_multiple_fun(df, by = df["Branch"], 
            vars = c(sum = "Loan_Amount", sum = "TAT", length = "TAT"))

输出:

Branch Loan_Amount_sum TAT_sum TAT_length
1      A             520    15.0          3
2      B             350     3.5          2

此外,有几种方法可以直接或通过函数名指定输出变量名。请注意,aggregate只被调用一次。

相关问题