keras 深度学习中没有训练

gcuhipw9  于 2023-08-06  发布在  其他
关注(0)|答案(2)|浏览(124)

深度学习模型不会得到完全超出范围的训练预测值,即使对于训练数据集也是如此,并为R2提供了一个巨大的负值。
如果我运行下面的代码,我会得到一个巨大的负值R2。数据集很大,特别是有很多特征。这是一个简化的数据文件,仍然能够重现问题:text。如果我用简单的线性回归或SVR算法运行相同的数据,我会得到一个很好的值(见下一段代码)。
我玩了参数,主要是层数,每层的单元数和学习率,没有成功。我也试着将数据标准化。
对于其他较小的问题,类似的代码起作用。你知道可能是什么问题吗?也许这对DL来说不是问题。
这里是深度学习模型:

import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from scikeras.wrappers import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

var='target'

matrixpd = pd.read_csv('data.csv', index_col=0)
md=matrixpd[var]
matrixpd = matrixpd.drop(columns=var)

def build_model():
  model = keras.Sequential([
    layers.Dense(16, activation='relu', input_shape=[len(matrixpd.keys())]),
#    layers.Dense(16, activation='relu'),
    layers.Dense(1)
  ])

  optimizer = tf.keras.optimizers.RMSprop(0.01)

  model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae', 'mse'])
  return model

EPOCHS = 100

estimator = KerasRegressor(model=build_model, epochs=EPOCHS, verbose=0)
kfold = KFold(n_splits=3)
results = cross_val_score(estimator, matrixpd, md, cv=kfold, scoring='r2')
print("R2: %.2f (%.2f)" % (results.mean(), results.std()))

字符串
线性回归模型:

import pandas as pd
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

var='target'

matrixpd = pd.read_csv('data.csv', index_col=0)
md=matrixpd[var]
matrixpd = matrixpd.drop(columns=var)

model = linear_model.LinearRegression()

kfold = KFold(n_splits=3)
results = cross_val_score(model, matrixpd, md, cv=kfold, scoring='r2')
print("R2: %.2f (%.2f)" % (results.mean(), results.std()))


谢谢你,谢谢

xghobddn

xghobddn1#

我能够在PyTorch中使用这些数据训练模型。与您的体系结构的主要区别是:
1.我发现它有助于逐渐减少通道,而不是突然过渡到16个单元的层。有500个输入功能,我做了以下工作:
密集(输出=500)、ReLU、BN
密集(250)、ReLU、BN
密集(125),ReLU,BN
密集(64)、ReLU、BN
密集(1)
您可以删除一些批处理规范(BN),根据一些快速实验,它仍然可以正常工作。
1.我发现RMSprop很难收敛,而学习率为0.01Adam收敛得很好。
1.我运行了200个epoch。列车MSE进展如下:
epoch 0|损失:8250
时代20|损失:7486
时代40|损失:5677
时代60|损失:2932
80年代|损失:699
时代100|损失:16
时代120|损失:6
时代140|损失:0.7
时代160|损失:0.07
时代180|损失:0.01
正如你所看到的,它在前60-80个时期开始时相对较高,之后急剧下降。
我使用下面的方法对数据进行了标准化,但发现它对收敛没有太大影响,可能是因为BN执行了自己的标准化。

matrixpd_scaled = (matrixpd - matrixpd.mean(axis=0)) / matrixpd.std(axis=0)

字符串
作为参考,我的PyTorch代码是:

import torch, torch.nn as nn

#load data and split into features/target
data = pd.read_csv('data.csv', index_col=0)
y = data['target'].to_numpy().ravel()
x = data.drop(columns='target')

#standardise features per channel
x_n = (x - x.mean(axis=0)) / x.std(axis=0)

torch.manual_seed(100) #for reproducible results

#Define the model. A tapered dense network.
model = nn.Sequential(nn.Linear(x.shape[1], 500), nn.ReLU(), nn.BatchNorm1d(500),
                      nn.Linear(500, 250), nn.ReLU(),  nn.BatchNorm1d(250),
                      nn.Linear(250, 125), nn.ReLU(),  nn.BatchNorm1d(125),
                      nn.Linear(125, 64), nn.ReLU(),  nn.BatchNorm1d(64),
                      nn.Linear(64, 1))
#optim = torch.optim.RMSprop(model.parameters(), 0.01)
optim = torch.optim.Adam(model.parameters(), 0.01)

for epoch in range(200):
    #prediction and MSE loss
    yhat = model(torch.tensor(x_n.values).to(torch.float32))
    loss = nn.MSELoss()(yhat.ravel(), torch.tensor(y).to(torch.float32))
    
    #Backprop
    optim.zero_grad()
    loss.backward()
    optim.step()
    if epoch % 20 == 0:
        print('epoch', epoch, '| loss:', loss.item())

uz75evzq

uz75evzq2#

谢谢,这对我来说也很有用,即使是更大的数据集。我试着用Keras实现一些等价的东西,它适用于这个简化的数据集,但不适用于更大的数据集。而且,PyTorch做得更快。我想知道我没有用Keras做的PyTorch是什么。
以下是PyTorch和Keras的最终实现,以供比较:
PyTorch:

import torch, torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

var='target'

#load data and split into features/target
data = pd.read_csv('dataFULL.csv', index_col=0)

train_dataset = data.sample(frac=0.7,random_state=1)
test_dataset = data.drop(train_dataset.index)

train_labels = train_dataset.pop(var)
test_labels = test_dataset.pop(var)

#Define the model. A tapered dense network.
model = nn.Sequential(nn.Linear(train_dataset.shape[1], 512), nn.ReLU(), nn.BatchNorm1d(512),
                      nn.Linear(512, 256), nn.ReLU(),  nn.BatchNorm1d(256),
                      nn.Linear(256, 128), nn.ReLU(),  nn.BatchNorm1d(128),
                      nn.Linear(128, 64), nn.ReLU(),  nn.BatchNorm1d(64),
                      nn.Linear(64, 1))
#optim = torch.optim.RMSprop(model.parameters(), 0.01)
optim = torch.optim.Adam(model.parameters(), 0.01)

for epoch in range(200):
    #prediction and MSE loss
    yhat = model(torch.tensor(train_dataset.values).to(torch.float32))
    loss = nn.MSELoss()(yhat.ravel(), torch.tensor(train_labels).to(torch.float32))
    
    #Backprop
    optim.zero_grad()
    loss.backward()
    optim.step()
    yhatt=model(torch.tensor(test_dataset.values).to(torch.float32))
    yhatt = yhatt.detach().numpy()
    score = np.corrcoef(test_labels, yhatt.reshape(test_labels.shape))

yhat = model(torch.tensor(test_dataset.values).to(torch.float32))

yhat = yhat.detach().numpy()
plt.scatter(test_labels, yhat)
plt.xlabel('True Values')
plt.ylabel('Predictions')
plt.axis('equal')
plt.axis('square')
_ = plt.plot([-1000, 1000], [-1000, 1000])

score = np.corrcoef(test_labels, yhat.reshape(test_labels.shape))
print('R=', score[0,1])

字符串
Keras:

import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

var='target'

#load data and split into features/target
data = pd.read_csv('dataFULL.csv', index_col=0)

train_dataset = data.sample(frac=0.7,random_state=0)
test_dataset = data.drop(train_dataset.index)
mean = train_dataset.mean(axis=0)
train_dataset -= mean
std = train_dataset.std(axis=0)
train_dataset /= std
test_dataset -= mean
test_dataset /= std

train_labels = train_dataset.pop(var)
test_labels = test_dataset.pop(var)

def build_model():
  model = keras.Sequential([
    layers.Dense(512, activation='relu', input_shape=[len(train_dataset.keys())]),
    layers.Dense(256, activation='relu'),
    layers.Dense(128, activation='relu'),
    layers.Dense(64, activation='relu'),
    layers.Dense(1)
  ])

#  optimizer = tf.keras.optimizers.RMSprop(0.01)
  optimizer = tf.keras.optimizers.Adam(0.01)
    
  model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mse'])
  return model

model = build_model()

EPOCHS = 200

history = model.fit(
  train_dataset, train_labels,
  epochs=EPOCHS, verbose=0)

test_predictions = model.predict(test_dataset).flatten()

test_labels *= std[var]
test_labels += mean[var]
test_predictions *= std[var]
test_predictions += mean[var]

plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values')
plt.ylabel('Predictions')
plt.axis('equal')
plt.axis('square')
_ = plt.plot([-1000, 1000], [-1000, 1000])

score = np.corrcoef(test_labels, test_predictions)
print("R =",score[0,1])

相关问题