为什么无法在TensorFlow中预测三次方程？

6za6bjd0 于 2023-01-13 发布在其他

关注(0)|答案(1)|浏览(165)

我是TensorFlow新手。我能够进行简单的预测。但当我进行更改时，它停止工作。为什么？如何修复？
我用过这个演示。我可以解出这样的方程：
y=2x-1
通过使用此代码：

model=Sequential([Dense(units=1,input_shape=[1])])
model.compile(optimizer='sgd',loss='mean_squared_error')

xs=np.array([-1.0,0.0,1.0,2.0])
ys=np.array([-3.0,-1.0,1.0,3.0])

model.fit(xs,ys,epochs=400)

print(model.predict([11,0]))

然后我尝试用同样的概念来解这个方程：
3x^3+5x^2+10
这是新代码：

model=Sequential([Dense(units=1,input_shape=[1])])
model.compile(optimizer='sgd',loss='mean_squared_error')

xs=np.array([5.0,6.0,7.0,8.0,10.0])
ys=np.array([435.0,730.0,1137.0,1674.0,3210.0])

model.fit(xs,ys,epochs=1000)

print(model.predict([11,0]))

我的问题是，如何改变我的代码，使它能正确地解决它？

tensorflow

来源：https://stackoverflow.com/questions/75060695/why-cannot-predict-in-tensorflow-a-equation-of-third-degree

1条答案

按热度按时间

e0uiprwp1#

这对新手来说有点奇怪，但你需要比初始任务有更多的自由度，而且你需要大量的数据来训练你的模型。
对于方程y = 2x-1，只需要一个权重（x的系数）和一个偏差（常数项）来拟合模型。但是，对于方程3x^3 + 2 * 11^2 + 10，您至少需要四个权重（方程中的每项一个）和一个偏差来正确拟合模型。但即使这样对模型来说也太难了，因为权重和偏差有大量可能的组合可以拟合这5个数据点（例如，您可以有一个完美拟合数据的模型，但这将是一个完全不相关的曲线，通过所有5个点），但将无法推广到其他数据点。你需要更多的数据点来训练你的模型。2我建议你使用一个至少有1000个数据点的数据集，这样你的模型就有更多的约束来拟合数据，因此，它就能够泛化到其他数据点。
但即便如此，问题仍然存在，因为方程3x^3 + 2 * 11^2 + 10不是一个线性方程，所以不能使用线性模型来拟合它，需要在模型中使用更多的层来模拟（例如，x^3项）。
即使你能绕过这个问题（例如，将x^3的值而不是x的值代入模型），仍然会有问题，因为方程3x^3 + 2 * 11^2 + 10的项范围很大，例如，在理想情况下，+10项需要多达10 / learning_rate批才能完成，SGD的标准学习率为0.01，所以至少需要1000个批次才能从接近0的初始值得到+10项。但是，另一方面，3x^3项的范围较小，所以只需要几个批次就可以得到。所以，你会遇到收敛的问题，因为模型会试图拟合+10项，而这与初始值相差甚远。而其他项可能已经接近正确的值。要克服这个问题，您需要使用过度参数化模型。在这种情况下，每个项将由许多小的子项表示，以便模型能够在几批中拟合每个项。
最后，您仍然会遇到一个问题，因为输入x和目标y的范围非常大。SGD以及其他优化算法在输入和目标的范围很小时工作得更好。因此，您需要规范化输入和目标。例如，您可以将输入x规范化为范围[0，1]，将目标y规范化为范围[-1，在这种情况下，梯度的幅度将小得多，因此模型将能够更快地收敛。
把所有这些放在一起，我建议你使用这样一个模型：

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

def f(x):
  return 3.0 * x ** 3.0 + 2.0 * 11.0 ** 2 + 10.0

x_train = np.linspace(-5, 5, 100_000) # very big training set
X = x_train # save x_train for later use
y_train = f(x_train)

# calculate the normalization factor for the x and y data
# simple scaling to [-1, 1] range
x_max = np.max(np.abs(x_train))
y_max = np.max(np.abs(y_train))

# normalize the data
x_train /= x_max
y_train /= y_max

# create test data that slightly out of the training range
# so, we can see how the model generalizes to unseen data ([-6, -5] and [5, 6])
x_test = np.concatenate([
  np.linspace(-6, -5, 1000),
  np.linspace(5, 6, 1000)
])
y_test = f(x_test)
# normalize the data by the same factor
x_test /= x_max
y_test /= y_max
###################################
activation = 'linear' # 'linear', 'relu', 'tanh', 'sigmoid'
NDims = 256 # number of neurons in each layer
dropoutRate = 0.0 # dropout rate. 0.0 means no dropout, try up to ~0.5
layers = [
  Dense(NDims, input_shape=[1], activation=activation), # input layer
]
for _ in range(3): # N hidden layers
  if 0.0 < dropoutRate:
    layers.append(Dropout(dropoutRate))
  layers.append(Dense(NDims, activation=activation))
  continue
layers.append(Dense(1)) # output layer

model = Sequential(layers)
model.compile(optimizer='sgd', loss='mean_squared_error')

model.fit(
  x_train, y_train,
  validation_data=(x_test, y_test),
  batch_size=32,
  shuffle=True, # shuffle the training data before each epoch
  epochs=10,
  # for restoring the best model after training
  callbacks=[
    tf.keras.callbacks.ModelCheckpoint(
      'model.h5',
      save_best_only=True,
      monitor='val_loss',
      verbose=1,
    ),
  ]
)
model.load_weights('model.h5') # load the best model
# evaluate the model on the In Distribution data, i.e. data that is very close to the training data
# data from the same distribution as the training data but with noise
noiseStd = np.diff(X).mean() * 1.0
x_idd = X + np.random.normal(0, noiseStd, size=X.shape)
y_idd = f(x_idd)
# normalize the data by the same factor
x_idd /= x_max
y_idd /= y_max
evaluation = model.evaluate(x_idd, y_idd, verbose=1)
# it should be very good
print('Evaluation on ID data: ', evaluation)
########################################################
# evaluate the model on the OOD data, i.e. data that is very far from the training data
x_ood = np.linspace(-100, 100, 100000)
y_ood = f(x_ood)
# normalize the data by the same factor
x_ood /= x_max
y_ood /= y_max
evaluation = model.evaluate(x_ood, y_ood, verbose=1)
# it would be very painful :D NNs typically don't generalize well to OOD data
print('Evaluation on OOD data: ', evaluation)

我强烈建议你尝试一下这个代码/模型，看看它的表现如何。例如，你可以尝试改变激活函数，每层的神经元数量，层数，脱落率等。特别鼓励你尝试'relu'激活函数。
如你所见，（简单）神经网络不适合于有精确解的"低维"问题。它们在无法用精确方法解决的高维问题中表现出色。例如，没有精确的方程将RGB图像转换为猫或狗的概率分布。但是，神经网络可以从训练数据中学习这种Map。这将更加有效。因为每个图像将由许多像素而不是仅由单个数字表示。

赞(0）回复(0）举报 2023-01-13

我来回答

为什么无法在TensorFlow中预测三次方程？

1条答案

相关问题

热门标签

最新问答