无法运行概率tensorflow 模型

ntjbwcob  于 2023-10-23  发布在  其他
关注(0)|答案(1)|浏览(165)

我建立了一个测试tensorflow lstm 2头2输出模型,其中一个输出是probalilist。这个模型工作得很好。我做同样的工作,但添加更多的层,按照同样的程序...但是这个失败了,

2023-07-15 09:18:24.407504: W tensorflow/core/common_runtime/bfc_allocator.cc:491] ***********________***********________************______________________________________************
2023-07-15 09:18:24.408219: E tensorflow/stream_executor/dnn.cc:868] OOM when allocating tensor with shape[2866176000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2023-07-15 09:18:24.409011: W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at cudnn_rnn_ops.cc:1564 : INTERNAL: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 240, 240, 1, 120, 19904, 240] 
Traceback (most recent call last):
  File "E:\Anaconda3\envs\tf2.7_bigData\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "E:\Anaconda3\envs\tf2.7_bigData\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "Extracteur_feature2" "                 f"(type LSTM).

{{function_node __wrapped__CudnnRNN_device_/job:localhost/replica:0/task:0/device:GPU:0}} Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 240, 240, 1, 120, 19904, 240]  [Op:CudnnRNN]

Call arguments received by layer "Extracteur_feature2" "                 f"(type LSTM):
  • inputs=tf.Tensor(shape=(19904, 120, 240), dtype=float32)
  • mask=None
  • training=False
  • initial_state=None

Process finished with exit code 1

模型是这样构建的

def build_model(num_timesteps_in, nb_features, nb_attributs, nb_lstm_units, probalistic_model=True):
    """
    Construire un modèle avec tensorflow
    :param num_timesteps_in : combien de jours d'observation incluant le jour à prévoir en input
    :param nb_features : nombre de feature utilisé en input (exclus les apports si non assimilés)
    :param nb_attributs : nombre de d'attibut physiographique utilisé en input
    :param nb_lstm_units : nombre de neuronnes par layer
    :return : un modèle et le checkpoint pour lancer le trainning
    """

    # allocated memory on demand in lieu of full charge memory
    gpu_devices = tf.config.experimental.list_physical_devices("GPU")
    for device in gpu_devices:
        tf.config.experimental.set_memory_growth(device, True)

    def negative_loglikelihood(targets, estimated_distribution):
        return -estimated_distribution.log_prob(targets)

    tfd = tfp.distributions

    timeseries_input = tf.keras.Input(shape=(num_timesteps_in, nb_features))
    attrib_input = tf.keras.Input(shape=(nb_attributs,))

    xy = tf.keras.layers.LSTM(nb_lstm_units,  # activation='softsign'
                              kernel_initializer=tf.keras.initializers.glorot_uniform(),
                              return_sequences=True, stateful=False,
                              name='Extracteur_feature1')(timeseries_input)

    xy = tf.keras.layers.Dropout(0.2)(xy)

    xy = tf.keras.layers.LSTM(nb_lstm_units,  # activation='softsign'
                              kernel_initializer=tf.keras.initializers.glorot_uniform(),
                              return_sequences=True, stateful=False,
                              name='Extracteur_feature2')(xy)

    xy = tf.keras.layers.Dropout(0.2)(xy)

    xy = tf.keras.layers.LSTM(nb_lstm_units,  # activation='softsign'
                              kernel_initializer=tf.keras.initializers.glorot_uniform(),
                              return_sequences=False, stateful=False,
                              name='Extracteur_feature3')(xy)

    xy = tf.keras.layers.Dropout(0.2)(xy)

    allin_input = tf.keras.layers.Concatenate(axis=1, name='merged_head')([xy, attrib_input])

    allin_input = tf.keras.layers.Dense(nb_attributs, activation='softsign',
                                   kernel_initializer=tf.keras.initializers.he_uniform(),
                                   name='Dense111')(allin_input)

    allin_input = tf.keras.layers.Dropout(0.2)(allin_input)

    allin_input = tf.keras.layers.Dense(nb_attributs, activation='softsign',
                                   kernel_initializer=tf.keras.initializers.he_uniform(),
                                   name='Dense222')(allin_input)

    outputs = tf.keras.layers.Dropout(0.2)(allin_input)
    if probalistic_model:

        ################### block probability ##########################
        prevision = tf.keras.layers.Dense(1, activation='linear', name='deterministe_1')(outputs)
        probabilist = tf.keras.layers.Dense(2, activation='linear', name='probabilist_2')(outputs)

        probabilist = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
                                                                         scale=1e-3 + tf.math.softplus(
                                                                             0.05 * t[..., 1:])),
                                                    name='normal_dist')(probabilist)  # note this
        # 1e-3 pour éviter des prob. numériques
        # 0.5 pas clair, possiblement aide a accélérer l'optimisation éviter minimum locaux...
        # https://github.com/tensorflow/probability/issues/703

        ################### fin block probability ##########################

        model = tf.keras.Model(inputs=[timeseries_input, attrib_input], outputs=[prevision, probabilist])

        model.summary()
        # avec adam [.001 à .0005] résultats ET vitesse optimum
        optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
        loss = {'deterministe_1': 'mse', 'normal_dist': negative_loglikelihood}

        model.compile(optimizer=optimizer, loss=loss,
                      loss_weights=[1, 1])

    else:
        outputs = tf.keras.layers.Dense(1, activation='linear', name='deterministe')(allin_input)

        model = tf.keras.Model(inputs=[timeseries_input, attrib_input], outputs=outputs)

        model.summary()

        optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)  # avec adam [.001 à .0005] résultats ET vitesse optimum
        loss = 'mse'
        model.compile(optimizer=optimizer, loss=loss)

    return model, optimizer, loss

我在许多步骤训练,所以我重新加载最好的itteration和重新编译它由于自定义的损失fontimer谁给予我的问题。如果以另一种方式完成。loss_fct和优化器在前面被定义为build_model中使用的副本。

def negative_loglikelihood(targets, estimated_distribution):
        return -estimated_distribution.log_prob(targets)

    loss = {'deterministe_1': 'mse', 'normal_dist': negative_loglikelihood}
model = tensorflow.keras.models.load_model('path_to_model/model.h5',
                                              compile=False)
model.compile(optimizer=optimizer, loss=loss_fct)

我不明白这个错误,这个模型在没有tf的情况下测试了很多次。概率和工作正常(lstm输入shpe是可以的...)。什么是新闻是添加tf.probability的第二个输出(在简单版本中工作良好)和重新加载compile=False和recompile(在简单模型中工作也很好)
我在这个问题上工作了3个星期,我不知道该怎么办。
tensorflow 2.10 tensorflow-probability 0.14.0 window/Anaconda

laximzn5

laximzn51#

我终于成功地更新到tf=2.13和更新tfp

相关问题