我建立了一个测试tensorflow lstm 2头2输出模型,其中一个输出是probalilist。这个模型工作得很好。我做同样的工作,但添加更多的层,按照同样的程序...但是这个失败了,
2023-07-15 09:18:24.407504: W tensorflow/core/common_runtime/bfc_allocator.cc:491] ***********________***********________************______________________________________************
2023-07-15 09:18:24.408219: E tensorflow/stream_executor/dnn.cc:868] OOM when allocating tensor with shape[2866176000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2023-07-15 09:18:24.409011: W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at cudnn_rnn_ops.cc:1564 : INTERNAL: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 240, 240, 1, 120, 19904, 240]
Traceback (most recent call last):
File "E:\Anaconda3\envs\tf2.7_bigData\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "E:\Anaconda3\envs\tf2.7_bigData\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "Extracteur_feature2" " f"(type LSTM).
{{function_node __wrapped__CudnnRNN_device_/job:localhost/replica:0/task:0/device:GPU:0}} Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 240, 240, 1, 120, 19904, 240] [Op:CudnnRNN]
Call arguments received by layer "Extracteur_feature2" " f"(type LSTM):
• inputs=tf.Tensor(shape=(19904, 120, 240), dtype=float32)
• mask=None
• training=False
• initial_state=None
Process finished with exit code 1
模型是这样构建的
def build_model(num_timesteps_in, nb_features, nb_attributs, nb_lstm_units, probalistic_model=True):
"""
Construire un modèle avec tensorflow
:param num_timesteps_in : combien de jours d'observation incluant le jour à prévoir en input
:param nb_features : nombre de feature utilisé en input (exclus les apports si non assimilés)
:param nb_attributs : nombre de d'attibut physiographique utilisé en input
:param nb_lstm_units : nombre de neuronnes par layer
:return : un modèle et le checkpoint pour lancer le trainning
"""
# allocated memory on demand in lieu of full charge memory
gpu_devices = tf.config.experimental.list_physical_devices("GPU")
for device in gpu_devices:
tf.config.experimental.set_memory_growth(device, True)
def negative_loglikelihood(targets, estimated_distribution):
return -estimated_distribution.log_prob(targets)
tfd = tfp.distributions
timeseries_input = tf.keras.Input(shape=(num_timesteps_in, nb_features))
attrib_input = tf.keras.Input(shape=(nb_attributs,))
xy = tf.keras.layers.LSTM(nb_lstm_units, # activation='softsign'
kernel_initializer=tf.keras.initializers.glorot_uniform(),
return_sequences=True, stateful=False,
name='Extracteur_feature1')(timeseries_input)
xy = tf.keras.layers.Dropout(0.2)(xy)
xy = tf.keras.layers.LSTM(nb_lstm_units, # activation='softsign'
kernel_initializer=tf.keras.initializers.glorot_uniform(),
return_sequences=True, stateful=False,
name='Extracteur_feature2')(xy)
xy = tf.keras.layers.Dropout(0.2)(xy)
xy = tf.keras.layers.LSTM(nb_lstm_units, # activation='softsign'
kernel_initializer=tf.keras.initializers.glorot_uniform(),
return_sequences=False, stateful=False,
name='Extracteur_feature3')(xy)
xy = tf.keras.layers.Dropout(0.2)(xy)
allin_input = tf.keras.layers.Concatenate(axis=1, name='merged_head')([xy, attrib_input])
allin_input = tf.keras.layers.Dense(nb_attributs, activation='softsign',
kernel_initializer=tf.keras.initializers.he_uniform(),
name='Dense111')(allin_input)
allin_input = tf.keras.layers.Dropout(0.2)(allin_input)
allin_input = tf.keras.layers.Dense(nb_attributs, activation='softsign',
kernel_initializer=tf.keras.initializers.he_uniform(),
name='Dense222')(allin_input)
outputs = tf.keras.layers.Dropout(0.2)(allin_input)
if probalistic_model:
################### block probability ##########################
prevision = tf.keras.layers.Dense(1, activation='linear', name='deterministe_1')(outputs)
probabilist = tf.keras.layers.Dense(2, activation='linear', name='probabilist_2')(outputs)
probabilist = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(
0.05 * t[..., 1:])),
name='normal_dist')(probabilist) # note this
# 1e-3 pour éviter des prob. numériques
# 0.5 pas clair, possiblement aide a accélérer l'optimisation éviter minimum locaux...
# https://github.com/tensorflow/probability/issues/703
################### fin block probability ##########################
model = tf.keras.Model(inputs=[timeseries_input, attrib_input], outputs=[prevision, probabilist])
model.summary()
# avec adam [.001 à .0005] résultats ET vitesse optimum
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
loss = {'deterministe_1': 'mse', 'normal_dist': negative_loglikelihood}
model.compile(optimizer=optimizer, loss=loss,
loss_weights=[1, 1])
else:
outputs = tf.keras.layers.Dense(1, activation='linear', name='deterministe')(allin_input)
model = tf.keras.Model(inputs=[timeseries_input, attrib_input], outputs=outputs)
model.summary()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) # avec adam [.001 à .0005] résultats ET vitesse optimum
loss = 'mse'
model.compile(optimizer=optimizer, loss=loss)
return model, optimizer, loss
我在许多步骤训练,所以我重新加载最好的itteration和重新编译它由于自定义的损失fontimer谁给予我的问题。如果以另一种方式完成。loss_fct和优化器在前面被定义为build_model中使用的副本。
def negative_loglikelihood(targets, estimated_distribution):
return -estimated_distribution.log_prob(targets)
loss = {'deterministe_1': 'mse', 'normal_dist': negative_loglikelihood}
model = tensorflow.keras.models.load_model('path_to_model/model.h5',
compile=False)
model.compile(optimizer=optimizer, loss=loss_fct)
我不明白这个错误,这个模型在没有tf的情况下测试了很多次。概率和工作正常(lstm输入shpe是可以的...)。什么是新闻是添加tf.probability的第二个输出(在简单版本中工作良好)和重新加载compile=False和recompile(在简单模型中工作也很好)
我在这个问题上工作了3个星期,我不知道该怎么办。
tensorflow 2.10 tensorflow-probability 0.14.0 window/Anaconda
1条答案
按热度按时间laximzn51#
我终于成功地更新到tf=2.13和更新tfp