numpy 一维最小绝对差/偏差(LAD)的子梯度

wnavrhmk 于 12个月前发布在其他

关注(0)|答案(4)|浏览(114)

我正在尝试解决以下一维最小绝对差（LAD）优化问题

的数据
我使用二分法来寻找最佳beta（标量）。我有以下代码：

import numpy as np
x = np.random.randn(10)
y = np.sign(np.random.randn(10))  + np.random.randn(10)

def find_best(x, y):
    best_obj = 1000000
    for beta in np.linspace(-100,100,1000000):
        if np.abs(beta*x - y).sum() < best_obj:
            best_obj = np.abs(beta*x - y).sum()
            best_beta = beta
    return best_obj, best_beta

def solve_bisect(x, y):
    it = 0
    u = 100
    l = -100
    while True:
        it +=1

        if it > 40:
            # maxIter reached
            return obj, beta, subgrad

        # select the mid-point
        beta = (l + u)/2

        # subgrad calculation. \partial |x*beta - y| = sign(x*beta - y) * x. np.abs(x * beta -y) > 1e-5 is to avoid numerical issue
        subgrad = (np.sign(x * beta - y) * (np.abs(x * beta - y) > 1e-5) * x).sum()
        obj = np.sum(np.abs(x * beta - y))
        print 'obj = %f, subgrad = %f, current beta = %f' % (obj, subgrad, beta)

        # bisect. check subgrad to decide which part of the space to cut out
        if np.abs(subgrad) <1e-3:
            return obj, beta, subgrad
        elif subgrad > 0:
            u = beta + 1e-3
        else:
            l = beta - 1e-3
brute_sol = find_best(x,y)
bisect_sol = solve_bisect(x,y)
print 'brute_sol: obj = %f, beta = %f' % (brute_sol[0], brute_sol[1])                                                                                                                                         
print 'bisect_sol: obj = %f, beta = %f, subgrad = %f' % (bisect_sol[0], bisect_sol[1], bisect_sol[2])

字符串
正如你所看到的，我也有一个蛮力的实现，它在空间上搜索以得到预言的答案（直到一些数字错误）。每次运行都可以找到最优的最佳和最小的目标值。但是，subgrad不是0（甚至不接近）。例如，我的一次运行得到了以下结果：

brute_sol: obj = 10.974381, beta = -0.440700
bisect_sol: obj = 10.974374, beta = -0.440709, subgrad = 0.840753

型
目标值和最佳值都是正确的，但subgrad根本不接近0。所以问题是：
1.为什么次微分不接近于0？，次微分的最优条件是0不也是最优的当且仅当它是最优的？
1.我们应该用什么停止标准来代替呢？

numpy

来源：https://stackoverflow.com/questions/42959598/sub-gradient-of-one-dimensional-least-absolute-difference-deviation-lad

4条答案

按热度按时间

biswetbf1#

我不太熟悉次梯度这个术语，但是为了理解为什么你计算的subgrad通常不为0，让我们看一下下面的简单例子：x1= 1000，x2 = 1，y1 = 0，y2 = 1。
最小值显然是1，在beta=0时达到，但subgrad等于-1。但请注意，在beta=0+eps时梯度为999，在beta=0-eps时梯度为-1001，这表明了正确的标准：
lim beta->beta0-0 nabla_beta < 0 and lim beta->beta0+0 nabla_beta > 0

赞(0）回复(0）举报 12个月前

x8goxv8g2#

我不是次导数方面的Maven，但我的理解是，在不可微点，函数通常会有许多次导数。|M| < 1都是次切线。显然最小值在0，但1肯定是有效的次梯度。
至于你的问题，我认为有一些更快的方法。你知道解一定会出现在其中一个结点上（即函数不可微的点）。这些不可微点出现在n个点beta = y_i / x_i。第一种方法是只计算n个点中每个点的目标，并取最小值。（这是O（n^2））。第二种方法是对候选解列表进行排序（n*log（n）），然后执行二分法（log（n））。代码显示了蛮力，尝试所有不可微点，二分法（可能有一些角落的情况下，我还没有想到所有的方式通过）我还绘制了目标的一个示例，以便您可以验证次梯度不需要为零。

import numpy as np
import matplotlib.pyplot as plt
import pdb

def brute_force(x,y):
    optimum = np.inf
    beta_optimal = 0

    for b in np.linspace(-5,5,1000):
        obj = np.sum(np.abs(b * x - y))
        if obj < optimum:
            beta_optimal = b
            optimum = obj
    return beta_optimal, optimum

def faster_solve(x, y):
    soln_candidates = y / (x + 1e-8) # hack for div by zero
    optimum = np.inf
    beta_optimal = 0
    for b in soln_candidates.squeeze():
        obj = np.sum(np.abs(b * x - y))
        if obj < optimum:
            beta_optimal = b
            optimum = obj

    return beta_optimal, optimum

def bisect_solve(x,y):
    soln_candidates = (y / (x + 1e-8)).squeeze() # avoid div by zero
    sorted_solns = np.sort(soln_candidates)
    indx_l = 0
    indx_u = x.shape[0] - 1

    while True:
        if (indx_l + 1 >= indx_u):
            beta_l = sorted_solns[indx_l]
            beta_u = sorted_solns[indx_u]

            obj_l = np.sum(np.abs(beta_l * x - y))
            obj_u = np.sum(np.abs(beta_u * x - y))

            if obj_l < obj_u:
                return beta_l, obj_l
            else:
                return beta_u, obj_u

        mid = int((indx_l + indx_u)/2)
        beta_mid = sorted_solns[mid]
        diff_mid = beta_mid * x - y
        subgrad_mid = np.sum(np.sign(diff_mid) * x)
        if subgrad_mid > 0:
            indx_u = mid
        elif subgrad_mid < 0:
            indx_l = mid

def main():
  np.random.seed(70963)
  N = 10
  x = np.random.randint(-9,9, (N,1))
  y = np.random.randint(-9,9, (N,1))

  num_plot_pts = 1000
  beta = np.linspace(-5,5, num_plot_pts)
  beta = np.expand_dims(beta, axis=1)
  abs_diff_mat = np.abs(np.dot(x, beta.T) - y)  # broadcasting!!
  abs_dev = np.sum(abs_diff_mat, axis=0)  # sum the rows together

  brute_optimal_beta, brute_optimum = brute_force(x,y)
  fast_beta, fast_optimum = faster_solve(x,y)
  bisect_beta, bisect_optimum = bisect_solve(x,y)

  print('Brute force beta: {0:.4f} with objective value {1:.4f}'.format(brute_optimal_beta, brute_optimum))
  print('Faster solve beta: {0:.4} with objective value {1:.4}'.format(fast_beta, fast_optimum))
  print('Bisection solve beta: {0:4} with objective value {1:4}'.format(bisect_beta, bisect_optimum))

  plt.plot(beta, abs_dev)
  plt.grid()
  plt.xlabel(r'$\beta$')
  plt.ylabel(r'$\sum_{i=1}^N |\beta x_i - y_i|$')
  plt.title(r'Absolute Deviation as function of $\beta$')
  plt.tight_layout()
  plt.show()

if __name__ == '__main__':
  main()

字符串
x1c 0d1x的数据

赞(0）回复(0）举报 12个月前

zte4gxcn3#

把你的问题写成矩阵形式：
$$ f \left（\beta \right）= {\left| X \beta - y \right|}_{2} $$
x1c 0d1x的数据
然后又道：
$$ \frac{\partial f \left（\beta \right）}{\partial \beta} = {X}^{T} \operatorname{sign} \left（X \beta - y \right）$$

的
这就是函数的次梯度。

赞(0）回复(0）举报 12个月前

hpcdzsge4#

由于论证是标度的，这个问题可以很容易地用一维方法来解决，但也可以用解析方法来解决。
Defining $ f \left( a \right) = {\left| a \boldsymbol{x} - \boldsymbol{y} \right|}{1} $ then $ {f}^{'} \left( a \right) = \sum {x}{i} \operatorname{sign} \left( a {x}{i} - {y}{i} \right) $. By defining $ {s}{i} \left( a \right) = \operatorname{sign} \left( a {x}{i} - {y}{i} \right) $ we can write $ {f}^{'} \left( a \right) = \boldsymbol{s}^{T} \left( a \right) \boldsymbol{x} $ which a linear function of $ \boldsymbol{x} $. Moreover, the function is piece wise constant with breaking points / junction points at $ a = \frac{ {y}{i} }{ {x}{i} } $.
Then there exist $ k $ such that $ \boldsymbol{s}^{T} \left( {a}{k} - \epsilon \right) \boldsymbol{x} < 0 $ and $ \boldsymbol{s}^{T} \left( {a}{k} + \epsilon \right) \boldsymbol{x} > 0 $ which is the minimizer of $ f \left( a \right) = {\left| a \boldsymbol{x} - \boldsymbol{y} \right|}{1} $.

的数据
在上图中，我们可以看到每个“连接点”都有一个垂直范围。最佳参数是在其范围内包含零的连接点。

的
在上图中，我们可以看到客观价值的值，实际上，它的最小值是在上面标出的接合点上获得的。
我在MATLAB中实现了这两种方法，并与CVX进行了代码验证。matlab代码在我的StackExchange Mathematics Q3674241 GitHub Repository中是可以访问的。
请参阅：