用矢量化替换两个卷积for循环

9o685dep 于 2021-08-25 发布在 Java

关注(0)|答案(1)|浏览(401)

这是我的第一个问题，如果我做错了什么，请告诉我。
我有两个迂回的for循环，它们可以工作，但是速度太慢了。我知道我应该使用矢量化来加快速度，但我不知道在我的示例中如何做到这一点。任何帮助都将不胜感激。我问题的背景是，我需要计算不同股票一年的每日价格数据的平方差之和（ssd）。我需要计算1046只股票列表中两只股票的每一种可能组合的ssd，这些股票保存在Pandas Dataframe 中（连同它们的价格数据）。
到目前为止，我有两个for循环，只计算列表中第一个股票和其他股票的每个可能组合的ssd。目前，我很乐意将这两个循环矢量化，以便使它们更快。我已经尝试使用while循环或在函数中定义它们，但这并没有像我所需要的那样提高速度。如果还有比矢量化更好的方法，请让我知道我走错了路。
我的数据框“formation\u period\u 1\u 1991”，我从中提取价格数据，基本上是这样的（其中“permno”是单个股票的标识符）：

data = [['99000', 10], ['99000', 11], ['99000', 12],['98000', 3], ['98000', 2], ['98000', 5],['97000', 9], ['97000',11], ['97000', 10]]
formation_period_1_1991 = pd.DataFrame(data, columns = ['PERMNO', 'Price'])

然后我定义了一个矩阵来保存ssd的计算值：

Axis_for_SSD_Matrix = formation_period_1_1991["PERMNO"].unique().tolist()
SSD_Matrix = pd.DataFrame(index=np.arange(formation_period_1_1991["PERMNO"].nunique()), columns=np.arange(formation_period_1_1991["PERMNO"].nunique()))
SSD_Matrix.set_axis(Axis_for_SSD_Matrix, axis="index",inplace=True)
SSD_Matrix.set_axis(Axis_for_SSD_Matrix, axis="columns",inplace=True)

最后，我使用两个for循环计算ssd_矩阵第一行的ssd：

x=3# is equal to number of trading days
no_of_considered_shares =(formation_period_1_1991["PERMNO"].nunique())
j=1

for j in range(1,no_of_considered_shares):
    SSD_calc = 0
    i=0
    for i in range(0,x): #x is no_of_trading_days
        SSD_calc = SSD_calc + (formation_period_1_1991.iloc[i]["Price"]-formation_period_1_1991.iloc[i+x*j]["Price"])**2 
    SSD_Matrix.loc[formation_period_1_1991.iloc[0]["PERMNO"],formation_period_1_1991.iloc[x*j]["PERMNO"]]=SSD_calc

运行代码后，ssd_矩阵如下所示：

index 99000 98000 97000
  0  99000  nan   179    5
  1  98000  nan   nan   nan
  2  97000  nan   nan   nan

到目前为止，它的工作正如我所希望的，但由于我真正的数据框架“形成期1年1991年”有1046只股票，每个交易日253个，如果有人能提供任何帮助，如何大大提高这两个for循环的速度（我猜是通过矢量化）。非常感谢！

python pandas numpy vectorization

来源：https://stackoverflow.com/questions/68302821/replacing-two-convoluted-for-loops-with-vectorization

1条答案

按热度按时间

w9apscun1#

这是：

formation_period_1_1991.index = formation_period_1_1991.index % formation_period_1_1991['PERMNO'].unique().shape[0]
df = formation_period_1_1991.pivot(columns='PERMNO', values='Price')
arr = df.to_numpy()

def combinations(arr):
    n = arr.shape[0]
    upper = np.tri(n,n,-1,dtype='bool').T
    a,b = np.meshgrid(arr,arr)
    return b[upper].reshape(-1), a[upper].reshape(-1)

n = arr.shape[1]
a,b = combinations(np.arange(n))

out = np.zeros((n,n))
out[a,b] = ((arr[:,a]-arr[:,b])**2).sum(axis=0)
out[b,a] = out[a,b]
out_df = pd.DataFrame(out)
out_df.columns = df.columns
out_df.index = df.columns.values
out_df

给我：

PERMNO  97000  98000  99000
97000     0.0  142.0    5.0
98000   142.0    0.0  179.0
99000     5.0  179.0    0.0

注意，我实际上只是在计算矩阵的上三角。我只是假设下三角形看起来像上三角形，并且对角线上总是有零。

赞(0）回复(0）举报 2021-08-25

我来回答

用矢量化替换两个卷积for循环

1条答案

相关问题

热门标签

最新问答