numpy 使用稀疏矩阵的非均匀域移动平均

xghobddn  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(93)

Moving average with non-uniform domain工作正常:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def convolve(s, f):
    """Compute the convolution of series S with a universal function F
    (see https://numpy.org/doc/stable/reference/ufuncs.html).
    This amounts to a moving average of S with weights F based on S.index."""
    index_v = s.index.values
    weight_mx = f(index_v - index_v[:, np.newaxis])
    weighted_sum = np.sum(s.values[:, np.newaxis] * weight_mx, axis=0)
    normalization = np.sum(weight_mx, axis=0)
    return pd.Series(weighted_sum/normalization, index=s.index)

size = 1000
df = pd.DataFrame({"x":np.random.normal(size=size)},
                  index=np.random.exponential(size=size)).sort_index()
def f(x):
    return np.exp(-x*x*30)
df["avg"] = convolve(df.x, f)
plt.scatter(df.index, df.avg, s=1, label="average")
plt.scatter(df.index, df.x, s=1, label="random")
plt.title("Moving Average for random data")
plt.legend()

字符串


的数据

  • 但是 *,这会分配一个O(size^3)数组:

MemoryError:无法为具有形状(14454,14454,14454)和数据类型float64的数组分配22.0 TiB

是否可以将函数convolvesparsify”?

具体来说,f通常会在相当窄的值范围内返回非0。

zd287kbt

zd287kbt1#

weighted_sum = np.sum(s.values[:, np.newaxis] * weight_mx, axis=0)

字符串
相当于

weighted_sum = weight_mx @ s.values


其是在不创建大的中间阵列的情况下计算的。
证据就在布丁里


的数据

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def convolve(s, f):
    """Compute the convolution of series S with a universal function F
    (see https://numpy.org/doc/stable/reference/ufuncs.html).
    This amounts to a moving average of S with weights F based on S.index."""
    index_v = s.index.values
    weight_mx = f(index_v - index_v[:, np.newaxis])
    weighted_sum = weight_mx @ s.values
    normalization = np.sum(weight_mx, axis=0)
    return pd.Series(weighted_sum/normalization, index=s.index)

size = 20_000 # 1000
df = pd.DataFrame({"x":np.random.normal(size=size)},
                  index=np.random.exponential(size=size)).sort_index()
def f(x):
    return np.exp(-x*x*30)
df["avg"] = convolve(df.x, f)
plt.scatter(df.index, df.avg, s=1, label="average")
plt.scatter(df.index, df.x, s=1, label="random")
plt.title("Moving Average for random data")
plt.legend()

相关问题