numpy 从梯度下降中获取无穷大值

aiazj4mn  于 2023-01-05  发布在  其他
关注(0)|答案(2)|浏览(162)

我试图实现一个多变量线性回归与梯度下降,但当我尝试这样做:

# Starting values
w = np.ones(3) # The number of features is 3
b = float(0)

def gradient_descent():
  global w
  global b

  learning_rate = 0.0001

  for i in range(x_train.shape[0]):
    prediction = np.dot(x_train[i], w) + b
    error = x_train[i] - prediction
    for j in range(w.shape[0]):
      w[j] = w[j] - (error * x_train[i][j] * learning_rate)
    b = b - (error * learning_rate)

def train():
  for i in range(10_000):
    gradient_descent()
    print(i, ':', w, b)

train()

输出为

0 : [inf inf inf] inf
1 : [inf inf inf] inf
2 : [inf inf inf] inf
3 : [inf inf inf] inf
4 : [inf inf inf] inf
5 : [inf inf inf] inf
6 : [inf inf inf] inf
....

那么我做错了什么呢?我试着降低学习率,但什么也没改变
数据样本:

total_rooms,population,households,bedrooms(target)
5612.0,1015.0,472.0,1283.0
7650.0,1129.0,463.0,1901.0
720.0,333.0,117.0,174.0
1501.0,515.0,226.0,337.0
1454.0,624.0,262.0,326.0

其中,房间、人口和住户总数为形状为(17000,3)的x_train,卧室为形状为(17000,1)的y_train
在拆分数据之前尝试使用sklearn.preprocessing.StandardScaler缩放数据时

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
train_data = scaler.fit_transform(train_data)
x_train = train_data[:, :3]
y_train = train_data[:, -1]

我得到的是nan而不是inf!
注意:无论是否使用sklearn.linear_model.LinearRegression,数据都可以正常工作

dhxwm5r4

dhxwm5r41#

正如评论中所建议的:特征缩放是一个好主意(scikit-learn包括SimpleScaler,但是减去每列的平均值并除以标准差也是非常简单的)。
此外:误差项似乎向后,残差通常为prediction - true

error = prediction - y[i]
zbsbpyhn

zbsbpyhn2#

无优化或任何担保:归一化和正确应用梯度下降公式会导致类似于

import numpy as np

def gradient_descent(x_train, y_train, w=np.ones(3), b=float(0), learning_rate=0.001):
    predictions = x_train @ w + b
    error = predictions - y_train
    w = w - learning_rate * error @ x_train
    b = b - learning_rate * sum(error)
    return w, b

def train():
    # data with last column being the target
    data = np.array(
        [
            [5612.0, 1015.0, 472.0, 1283.0],
            [7650.0, 1129.0, 463.0, 1901.0],
            [720.0, 333.0, 117.0, 174.0],
            [1501.0, 515.0, 226.0, 337.0],
            [1454.0, 624.0, 262.0, 326.0],
        ]
    )
    norm_offset = np.mean(data[:])
    norm_factor = 1 / np.std(data[:])
    data_normalized = (data - norm_offset) * norm_factor
    x_train = data_normalized[:, :-1]
    y_train = data_normalized[:, -1]

    # start values
    w = np.ones(x_train.shape[1])
    b = float(0)

    # train
    for i in range(10_000):
        w, b = gradient_descent(x_train, y_train, w, b)
        # o = offset, f = factor, w'/b' normalized parameters, w/b original parameters
        #   y' = w' * x' + b'
        #   f * (y - o) = w' * f * (x - o) + b'
        #   y = w' * (x - o) + b' / f + o
        #   y = w' * x - o * sum(w') + b' / f + o
        #   => w = w', b = b' / f + o - o * sum(w')
        b_orig = b / norm_factor + norm_offset - sum(w) * norm_offset
        ssr = np.sum((data[:, :3] @ w + b_orig - data[:, 3]) ** 2)
        print(i, ':', w, b_orig, ssr)

if __name__ == "__main__":
    train()
...
9999 : [0.13503938 0.69644619 0.75400302] -386.71116671360414 71015.11748640954

相关问题