改变 Dataframe 中的多个列

ukxgm1gy  于 2023-09-27  发布在  其他
关注(0)|答案(6)|浏览(91)

我有一个数据集,看起来像这样。

bankname    bankid   year    totass    cash    bond    loans
Bank A      1        1881    244789    7250    20218   29513
Bank B      2        1881    195755    10243   185151  2800
Bank C      3        1881    107736    13357   177612  NA
Bank D      4        1881    170600    35000   20000   5000
Bank E      5        1881    3200000   351266  314012  NA

我想根据银行的资产负债表来计算一些比率。我想让数据集看起来像这样

bankname    bankid   year    totass    cash    bond    loans    CashtoAsset   BondtoAsset    LoanstoAsset
Bank A      1        1881    2447890   7250    202100  951300   0.002         0.082          0.388
Bank B      2        1881    195755    10243   185151  2800     0.052         0.945          0.014
Bank C      3        1881    107736    13357   177612  NA       0.123         1.648585431    NA
Bank D      4        1881    170600    35000   20000   5000     0.205         0.117          0.029
Bank E      5        1881    32000000  351266  314012  NA       0.0109        0.009          NA

下面是复制数据的代码

bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E")
bankid <- c( 1, 2,  3,  4,  5)
year<- c( 1881, 1881,   1881,   1881,   1881)
totass  <- c(244789,    195755, 107736, 170600, 32000000)
cash<-c(7250,10243,13357,35000,351266)
bond<-c(20218,185151,177612,20000,314012)
loans<-c(29513,2800,NA,5000,NA)
bankdata<-data.frame(bankname, bankid,year,totass, cash, bond, loans)

首先,我去掉了资产负债表中的NA。

cols <- c("totass", "cash", "bond", "loans")
bankdata[cols][is.na(bankdata[cols])] <- 0

然后我计算比率

library(dplyr)
bankdata<-mutate(bankdata,CashtoAsset = cash/totass)
bankdata<-mutate(bankdata,BondtoAsset = bond/totass)
bankdata<-mutate(bankdata,loanstoAsset =loans/totass)

但是,我不想一行一行地计算所有这些比率,我想创建一个外观来一次完成所有这些。在斯塔塔我会

foreach x of varlist cash bond loans {
by bankid: gen `x'toAsset = `x'/ totass
}

我该怎么做?

jchrr9hc

jchrr9hc1#

更新(截至2019年3月18日)

情况有变。我们一直在.funsfuns(name = f(.))中使用funs()。但是这已经改变了(上面的dplyr 0.8.0)。现在我们使用listlist(name = ~f(.)))代替funs。请参见以下新示例。

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = vars(cash:loans))

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = c("cash", "bond", "loans"))

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = 5:7)

更新(截至2017年12月2日)

自从我回答了这个问题,我已经意识到一些SO用户一直在检查这个答案。dplyr包从那时起就变了。因此,我留下以下更新。我希望这能帮助一些R用户学习如何使用mutate_at()
mutate_each()现已弃用。您希望使用mutate_at()。您可以指定要在.vars中应用函数的列。一种方法是使用vars()。另一种方法是使用包含列名的字符向量,您希望在.fun中应用自定义函数。另一种方法是用数字指定列(例如,本例中为5:7)。请注意,如果使用group_by()列,则需要更改列位置的编号。看看this question

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = vars(cash:loans))

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = c("cash", "bond", "loans"))

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = 5:7)

#  bankname bankid year   totass   cash   bond loans cash_toAsset bond_toAsset loans_toAsset
#1   Bank A      1 1881   244789   7250  20218 29513   0.02961734  0.082593581    0.12056506
#2   Bank B      2 1881   195755  10243 185151  2800   0.05232561  0.945830247    0.01430359
#3   Bank C      3 1881   107736  13357 177612    NA   0.12397899  1.648585431            NA
#4   Bank D      4 1881   170600  35000  20000  5000   0.20515826  0.117233294    0.02930832
#5   Bank E      5 1881 32000000 351266 314012    NA   0.01097706  0.009812875            NA

我特意将toAsset给了.fun中的自定义函数,因为这将帮助我安排新的列名。以前,我使用rename()。但我认为在目前的方法中,用gsub()清理列名要容易得多。如果上面的结果保存为out,则需要运行以下代码以删除列名中的_

names(out) <- gsub(names(out), pattern = "_", replacement = "")

原始答案

我认为你可以用这种方式用dplyr保存一些打字工作。缺点是你覆盖了现金,债券和贷款。

bankdata %>%
    group_by(bankname) %>%
    mutate_each(funs(whatever = ./totass), cash:loans)

#  bankname bankid year   totass       cash        bond      loans
#1   Bank A      1 1881   244789 0.02961734 0.082593581 0.12056506
#2   Bank B      2 1881   195755 0.05232561 0.945830247 0.01430359
#3   Bank C      3 1881   107736 0.12397899 1.648585431         NA
#4   Bank D      4 1881   170600 0.20515826 0.117233294 0.02930832
#5   Bank E      5 1881 32000000 0.01097706 0.009812875         NA

如果你更喜欢你的预期结果,我认为一些打字是必要的。重命名的部分似乎是你必须做的事情。

bankdata %>%
    group_by(bankname) %>%
    summarise_each(funs(whatever = ./totass), cash:loans) %>%
    rename(cashtoAsset = cash, bondtoAsset = bond, loanstoAsset = loans) -> ana;
    ana %>%
    merge(bankdata,., by = "bankname")

#  bankname bankid year   totass   cash   bond loans cashtoAsset bondtoAsset loanstoAsset
#1   Bank A      1 1881   244789   7250  20218 29513  0.02961734 0.082593581   0.12056506
#2   Bank B      2 1881   195755  10243 185151  2800  0.05232561 0.945830247   0.01430359
#3   Bank C      3 1881   107736  13357 177612    NA  0.12397899 1.648585431           NA
#4   Bank D      4 1881   170600  35000  20000  5000  0.20515826 0.117233294   0.02930832
#5   Bank E      5 1881 32000000 351266 314012    NA  0.01097706 0.009812875           NA
muk1a3rh

muk1a3rh2#

Applycbind

cbind(bankdata,apply(bankdata[,5:7],2, function(x) x/bankdata$totass))
names(bankdata)[8:10] <- paste0(names(bankdata)[5:7], 'toAssest’)

> bankdata
  bankname bankid year   totass   cash   bond loans cashtoAssest bondtoAssest loanstoAssest
1   Bank A      1 1881   244789   7250  20218 29513   0.02961734  0.082593581    0.12056506
2   Bank B      2 1881   195755  10243 185151  2800   0.05232561  0.945830247    0.01430359
3   Bank C      3 1881   107736  13357 177612    NA   0.12397899  1.648585431            NA
4   Bank D      4 1881   170600  35000  20000  5000   0.20515826  0.117233294    0.02930832
5   Bank E      5 1881 32000000 351266 314012    NA   0.01097706  0.009812875            NA
mw3dktmi

mw3dktmi3#

下面是一个data.table解决方案。

library(data.table)
setDT(bankdata)
bankdata[, paste0(names(bankdata)[5:7], "toAsset") := 
           lapply(.SD, function(x) x/totass), .SDcols=5:7]
bankdata
#    bankname bankid year   totass   cash   bond loans cashtoAsset bondtoAsset loanstoAsset
# 1:   Bank A      1 1881   244789   7250  20218 29513  0.02961734 0.082593581   0.12056506
# 2:   Bank B      2 1881   195755  10243 185151  2800  0.05232561 0.945830247   0.01430359
# 3:   Bank C      3 1881   107736  13357 177612     0  0.12397899 1.648585431   0.00000000
# 4:   Bank D      4 1881   170600  35000  20000  5000  0.20515826 0.117233294   0.02930832
# 5:   Bank E      5 1881 32000000 351266 314012     0  0.01097706 0.009812875   0.00000000
sshcrbum

sshcrbum4#

这是dplyr的一大缺点:据我所知,没有直接的方式来使用它的编程,而不是交互式没有某种“黑客”一样可悲的eval(parse(text=foo))习惯用法。
最简单的方法与Stata方法相同,但是R中的字符串操作比Stata(或任何其他脚本语言)中的字符串操作更详细。

for (x in c("cash", "bond", "loans")) {
  bankdata[sprintf("%stoAsset", x)] <- bankdata[x] / bankdata$totass  # or, equivalently, bankdata["totass"] for a consistent "look"
  ## can also replace `sprintf("%stoAsset", x)` with `paste0(c(x, "toAsset"))` or even `paste(x, "toAsset", collapse="") depending on what makes more sense to you.
}

为了使整个事情更像Stata,您可以将整个事情 Package 在within中,如下所示:

bankdata <- within(bankdata, for (x in c("cash", "bond", "loans")) {
  assign(x, get(x) / totass)
})

但这需要对getassign函数进行一些黑客攻击,这些函数通常使用起来并不安全,尽管在您的情况下这可能不是什么大问题。例如,我不建议在dplyr上尝试类似的技巧,因为dplyr滥用了R的非标准评估特性,而且可能会带来更多的麻烦。要获得更快、可能上级的解决方案,请查看data.table包,(我认为)它允许您使用类似于Stata的循环语法,但速度类似于dplyr。查看CRAN上的 Package 插图。
另外,您真的真的确定要将NA条目重新分配给0吗?

6psbrbz9

6psbrbz95#

试试看:

for(i in 5:7){
     bankdata[,(i+3)] = bankdata[,i]/bankdata[,4]
}
names(bankdata)[(5:7)+3] =  paste0(names(bankdata)[5:7], 'toAssest')

输出量:

bankdata
  bankname bankid year   totass   cash   bond loans cashtoAssest bondtoAssest loanstoAssest
1   Bank A      1 1881   244789   7250  20218 29513   0.02961734  0.082593581    0.12056506
2   Bank B      2 1881   195755  10243 185151  2800   0.05232561  0.945830247    0.01430359
3   Bank C      3 1881   107736  13357 177612     0   0.12397899  1.648585431    0.00000000
4   Bank D      4 1881   170600  35000  20000  5000   0.20515826  0.117233294    0.02930832
5   Bank E      5 1881 32000000 351266 314012     0   0.01097706  0.009812875    0.00000000
2sbarzqh

2sbarzqh6#

你可能把事情搞得有点复杂了。试试这个,看看它是否能产生你需要的东西。

bankdata$CashtoAsset <- bankdata$cash / bankdata$totass
bankdata$BondtoAsset <- bankdata$bond / bankdata$totass
bankdata$loantoAsset <- bankdata$loans / bankdata$totass
bankdata

产生以下结果:

bankname bankid year   totass   cash   bond   loans  CashtoAsset BondtoAsset loantoAsset
1   Bank A    1     1881     244789   7250  20218 29513  0.02961734  0.082593581 0.12056506
2   Bank B    2     1881     195755  10243 185151  2800  0.05232561  0.945830247 0.01430359
3   Bank C    3     1881     107736  13357 177612     0  0.12397899  1.648585431 0.00000
4   Bank D    4     1881     170600  35000  20000  5000  0.20515826  0.117233294 0.02930832
5   Bank E    5     1881   32000000 351266 314012     0  0.01097706  0.009812875 0.00000000

这应该让你开始在正确的方向。

相关问题