使用整数和字符串与R在Stata中进行数据子集化

ukdjmx9f 于 2022-12-06 发布在其他

关注(0)|答案(2)|浏览(164)

我以前从来没有使用过Stata，对它的了解也非常有限，我一直在尝试根据year，country1，country2来折叠双边信息的数据集，并取所有其他信息的平均值。在R中，我试着运行：

aggregate(dataset,by=list(dataset$year,dataset$country1,dataset$country2),FUN=mean,na.rm=TRUE)

这个数据集太大了，我的计算机RAM无法处理我在R中的折叠（另一个我无法解决的问题），当一个同事试图运行代码时，其他数据没有显示为均值（在某些情况下，只选择了特定二年-一年中的一行数据;在其他情况下，我甚至不确定发生了什么）。数据集的较小子集显示了正确的结果。
由于R中的问题，我想尝试在Stata中执行此操作，但之前我尝试使用

collapse (mean) <every variable I wanted a ``mean'' of, or otherwise wanted to remove from the dataset>, by(year country1 country2)

Stata不知道如何处理字符串。我对Stata的了解太少了，以至于我不知道如何解决这个问题。有人能给我提供代码吗？我需要在大量的变量上使用collapse命令，其中许多是字符串（对于字符串，我需要NA返回）。

来源：https://stackoverflow.com/questions/22363310/data-subsetting-in-stata-with-integers-and-strings-vs-r

2条答案

按热度按时间

yeotifhr1#

findname（Stata Journal）是用户编写的ds的继承者，具有更多的功能（事实）和更友好的语法（作者的观点，尽管同一作者是ds的最后一位作者）。

. sysuse auto
(1978 Automobile Data)

. ds, has(type numeric)
price         rep78         trunk         length        displacement  foreign
mpg           headroom      weight        turn          gear_ratio

. findname, type(numeric)
price         rep78         trunk         length        displacement  foreign
mpg           headroom      weight        turn          gear_ratio

在这两种情况下，您会发现数字变量的名称都返回到r(varlist)中：

. di "`r(varlist)'"
price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign

这样你就可以把它传给collapse

. collapse `r(varlist)',  by(year country1 country2)

一般而言，没有其他方法可以取代阅读collapse的说明和手动输入。

赞(0）回复(0）举报 2022-12-06

r1zhe5dt2#

如果您要计算平均值的字符串变量是被视为字符串的数字，例如“1”、“2”等，那么您可以使用real()或destring将变量转换为数值类型。不使用此形式的字符串变量，例如“alligator”、“lizard”、“snake”等，您不需要它们的平均值，将被删除。
示例：

clear all
set more off

* some example data
input ///
str4 numstr num str11 reptiles
"234" 234 "alligator"
"2135" 2135 "lizard"
"324" 324 "snake"
end

list

* create numeric variable from string
destring(numstr), gen(num2)

* the collapse
collapse (mean) num num2

list

赞(0）回复(0）举报 2022-12-06

我来回答

使用整数和字符串与R在Stata中进行数据子集化

2条答案

相关问题

热门标签

最新问答