使用Python和numpy的梯度下降

5jvtdoz2 于 9个月前发布在 Python

关注(0)|答案(6)|浏览(138)

def gradient(X_norm,y,theta,alpha,m,n,num_it):
    temp=np.array(np.zeros_like(theta,float))
    for i in range(0,num_it):
        h=np.dot(X_norm,theta)
        #temp[j]=theta[j]-(alpha/m)*(  np.sum( (h-y)*X_norm[:,j][np.newaxis,:] )  )
        temp[0]=theta[0]-(alpha/m)*(np.sum(h-y))
        temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1]))
        theta=temp
    return theta


X_norm,mean,std=featureScale(X)
#length of X (number of rows)
m=len(X)
X_norm=np.array([np.ones(m),X_norm])
n,m=np.shape(X_norm)
num_it=1500
alpha=0.01
theta=np.zeros(n,float)[:,np.newaxis]
X_norm=X_norm.transpose()
theta=gradient(X_norm,y,theta,alpha,m,n,num_it)
print theta

字符串
上面代码中的theta是100.2 100.2，但在matlab中应该是100.2 61.09，这是正确的。

numpy

来源：https://stackoverflow.com/questions/17784587/gradient-descent-using-python-and-numpy

6条答案

按热度按时间

iqih9akk1#

我认为你的代码有点太复杂了，它需要更多的结构，因为否则你会迷失在所有的方程和运算中。最后，这个回归归结为四个运算：
1.计算假设h = X * θ
1.计算损失= h - y，也许是成本的平方（损失^2）/2 m
1.计算梯度= X' * 损失/ m
1.更新参数theta = theta - alpha * gradient
在你的例子中，我猜你混淆了m和n。这里m表示训练集中的样本数量，而不是特征的数量。
让我们来看看我的代码变体：

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta

def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

字符串
首先，我创建一个小的随机数据集，它应该看起来像这样：
x1c 0d1x的数据
正如你所看到的，我还添加了生成的回归线和Excel计算的公式。
你需要注意使用梯度下降回归的直觉。当你对数据X进行一个完整的批处理时，你需要将每个例子的m损失减少到一个单一的权重更新。在这种情况下，这是梯度总和的平均值，因此除以m。
接下来你需要注意的是跟踪收敛和调整学习率，为此你应该总是跟踪每次迭代的成本，甚至可能绘制它。
如果你运行我的例子，返回的theta看起来像这样：

Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368   1.01108458]

型
这实际上非常接近excel计算的公式（y = x + 30）。注意，当我们将偏倚传递到第一列时，第一个theta值表示偏倚权重。

赞(0）回复(0）举报 9个月前

643ylb082#

下面你可以找到我对线性回归问题的梯度下降的实现。
首先，你计算像X.T * (X * w - y) / N这样的梯度，并同时用这个梯度更新你当前的theta。

X：特征矩阵
y：目标值
w：权重/值
N：训练集的大小

下面是Python代码：

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import random

def generateSample(N, variance=100):
    X = np.matrix(range(N)).T + 1
    Y = np.matrix([random.random() * variance + i * 10 + 900 for i in range(len(X))]).T
    return X, Y

def fitModel_gradient(x, y):
    N = len(x)
    w = np.zeros((x.shape[1], 1))
    eta = 0.0001

    maxIteration = 100000
    for i in range(maxIteration):
        error = x * w - y
        gradient = x.T * error / N
        w = w - eta * gradient
    return w

def plotModel(x, y, w):
    plt.plot(x[:,1], y, "x")
    plt.plot(x[:,1], x * w, "r-")
    plt.show()

def test(N, variance, modelFunction):
    X, Y = generateSample(N, variance)
    X = np.hstack([np.matrix(np.ones(len(X))).T, X])
    w = modelFunction(X, Y)
    plotModel(X, Y, w)

test(50, 600, fitModel_gradient)
test(50, 1000, fitModel_gradient)
test(100, 200, fitModel_gradient)

字符串
100d 1xx 1c 1d 1xx 1c 2d 1x的字符串

赞(0）回复(0）举报 9个月前

sirbozc53#

这些答案中的大多数都遗漏了一些关于线性回归的解释，并且代码有点复杂。
问题是，如果你有一个“m”个样本的数据集，每个样本称为“x^i”（n维向量），一个结果向量y（m维向量），你可以构造以下矩阵：
x1c 0d1x的数据
现在，目标是找到“w”（n+1维向量），它描述了线性回归的直线，“w_0”是常数项，“w_1”等是输入样本中每个维度（特征）的系数。所以本质上，你想找到“w”，使得“X*w”尽可能接近“y”，也就是说，你的线预测将尽可能接近原始结果。
还要注意的是，我们在每个“x^i”的开头添加了一个额外的分量/维度，它只是“1”，以说明常数项。此外，“X”只是通过将每个结果“堆叠”为一行而得到的矩阵，因此它是一个（m × n+1）矩阵。
一旦你构建了它，用于梯度下降的Python和Numpy代码实际上非常简单：

def descent(X, y, learning_rate = 0.001, iters = 100):
    w = np.zeros((X.shape[1], 1))
    for i in range(iters):
        grad_vec = -(X.T).dot(y - X.dot(w))
        w = w - learning_rate*grad_vec
    return w

字符串
瞧！这将返回向量“w”，或者说是对预测线的描述。

**但它是如何工作的？**在上面的代码中，我正在寻找成本函数的梯度向量（在这种情况下，平方差），然后我们将“逆流”，以找到最佳“w”给出的最小成本。实际使用的公式在行中

grad_vec = -(X.T).dot(y - X.dot(w))

型
有关完整的数学解释，以及包括创建矩阵的代码，请参阅how to implement gradient descent in Python上的这篇文章。
编辑：为了说明，上面的代码估计了一条线，您可以使用它来进行预测。下图显示了“学习”的梯度下降线（红色）的示例，以及来自Kaggle的“鱼市场”数据集的原始数据样本（蓝色散点）。

的

赞(0）回复(0）举报 9个月前

v09wglhw4#

我知道这个问题已经得到了回答，但我对GD功能做了一些更新：

### COST FUNCTION

def cost(theta,X,y):
     ### Evaluate half MSE (Mean square error)
     m = len(y)
     error = np.dot(X,theta) - y
     J = np.sum(error ** 2)/(2*m)
     return J

cost(theta,X,y)


def GD(X,y,theta,alpha):
    
    cost_histo = [0]
    theta_histo = [0]

    # an arbitrary gradient, to pass the initial while() check
    delta = [np.repeat(1,len(X))]
    # Initial theta
    old_cost = cost(theta,X,y)

    while (np.max(np.abs(delta)) > 1e-6):
        error = np.dot(X,theta) - y
        delta = np.dot(np.transpose(X),error)/len(y)
        trial_theta = theta - alpha * delta
        trial_cost = cost(trial_theta,X,y)
        while (trial_cost >= old_cost):
            trial_theta = (theta +trial_theta)/2
            trial_cost = cost(trial_theta,X,y)
            cost_histo = cost_histo + trial_cost
            theta_histo = theta_histo +  trial_theta
        old_cost = trial_cost
        theta = trial_theta
    Intercept = theta[0] 
    Slope = theta[1]  
    return [Intercept,Slope]
 
res = GD(X,y,theta,alpha)

字符串
这个函数在迭代过程中减少了alpha，使函数收敛得更快，参见R中的Estimating linear regression with Gradient Descent (Steepest Descent)示例。我在Python中应用了相同的逻辑。

赞(0）回复(0）举报 9个月前

vc9ivgsu5#

在@thomas-jungblut的python实现之后，我对Octave也做了同样的事情。如果你发现什么问题，请告诉我，我会修复+更新。
数据来自具有以下行的txt文件：
把它看作是一个非常粗略的样本，用于特征[卧室数量] [MTS 2]和最后一列[租金价格]，这是我们想要预测的。
以下是Octave的实现：

%
% Linear Regression with multiple variables
%

% Alpha for learning curve
alphaNum = 0.0005;

% Number of features
n = 2;

% Number of iterations for Gradient Descent algorithm
iterations = 10000

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% No need to update after here
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

DATA = load('CHANGE_WITH_DATA_FILE_PATH');

% Initial theta values
theta = ones(n + 1, 1);

% Number of training samples
m = length(DATA(:, 1));

% X with one mor column (x0 filled with '1's)
X = ones(m, 1);
for i = 1:n
  X = [X, DATA(:,i)];
endfor

% Expected data must go always in the last column  
y = DATA(:, n + 1)

function gradientDescent(x, y, theta, alphaNum, iterations)
  iterations = [];
  costs = [];

  m = length(y);

  for iteration = 1:10000
    hypothesis = x * theta;

    loss = hypothesis - y;

    % J(theta)    
    cost = sum(loss.^2) / (2 * m);

    % Save for the graphic to see if the algorithm did work
    iterations = [iterations, iteration];
    costs = [costs, cost];

    gradient = (x' * loss) / m; % /m is for the average

    theta = theta - (alphaNum * gradient);
  endfor    

  % Show final theta values
  display(theta)

  % Show J(theta) graphic evolution to check it worked, tendency must be zero
  plot(iterations, costs);

endfunction

% Execute gradient descent
gradientDescent(X, y, theta, alphaNum, iterations);

字符串

赞(0）回复(0）举报 9个月前

bq3bfh9z6#

here显示了autograd的GD &与sklearn的解决方案相比，* 稳健线性回归 *

#-------------------------------------------------------------------------------
# Name:        grad_intro
# Purpose:     1. Gradients Introduction
# Author: https://www.tomasbeuzen.com/deep-learning-with-pytorch/chapters/appendixC_computing-derivatives.html
#-------------------------------------------------------------------------------
import numpy as np
import sklearn
import sklearn.linear_model
import scipy.optimize
from scipy.optimize import minimize
import autograd # pip install autograd
from autograd import grad
import autograd.numpy as anp

## try robust regression with the Huber loss

d = 10
n = 1000    # !!!!!!!!!! the bigger - the better

# generate random data
X = anp.random.randn(n, d)
w_true = anp.random.randn(d)
y = X @ w_true
# add random outliers
Noutliers = 50
y[:Noutliers] += 100 * anp.random.randn(Noutliers)
print(w_true, '\n')

############################

from sklearn.linear_model import HuberRegressor

hr = HuberRegressor(fit_intercept=False, alpha=0)
hr.fit(X, y)
print(hr.coef_, '\n')

############################

huber = lambda z: 0.5 * z ** 2 * (anp.abs(z) <= 1) + (anp.abs(z) - 0.5) * (anp.abs(z) > 1)
f = lambda w: anp.sum(huber(X @ w - y))

df_dw = grad(f) # differentiate through matrix multiplications, etc.
w = np.zeros(d)
alpha = 0.001
while anp.linalg.norm(df_dw(w)) > 0.0001:   # <<<<<<<< G.D.
    w -= alpha * df_dw(w)

print(w)

字符串

赞(0）回复(0）举报 9个月前

我来回答

使用Python和numpy的梯度下降

6条答案

相关问题

热门标签

最新问答