tensorflow BERT预训练- masked_lm_accuracy始终为零

y4ekin9u  于 2023-05-23  发布在  其他
关注(0)|答案(1)|浏览(301)

我正在尝试使用官方tensorflow github repository在特定领域的数据集上从头开始训练BERT
我使用文档中的this部分来使脚本适应我的用例,但我遇到了一个问题。首先,我使用create_pretraining_data.py脚本将.txt文件处理为.tfrecord。这里一切正常,但是当我运行train.py脚本开始训练BERT模型时,next_sentence_accuracy在一些步骤后增加,但masked_lm_accuracy始终保持0。
这是提供给train.py脚本的config.yaml文件:

task:
  init_checkpoint: ''
  model:
    cls_heads: [{activation: tanh, cls_token_idx: 0, dropout_rate: 0.1, inner_dim: 768, name: next_sentence, num_classes: 2}]
    encoder:
      type: bert
      bert:
        attention_dropout_rate: 0.1
        dropout_rate: 0.1
        hidden_activation: gelu
        hidden_size: 768
        initializer_range: 0.02
        intermediate_size: 3072
        max_position_embeddings: 512
        num_attention_heads: 12
        num_layers: 12
        type_vocab_size: 2
        vocab_size: 50000
  train_data:
    drop_remainder: true
    global_batch_size: 32
    input_path: 'test_clean_tfrecord/2014/*'
    is_training: true
    max_predictions_per_seq: 20
    seq_length: 128
    use_next_sentence_label: true
    use_position_id: false
    use_v2_feature_names: false
  validation_data:
    drop_remainder: false
    global_batch_size: 32
    input_path: 'test_clean_tfrecord/2014/*'
    is_training: false
    max_predictions_per_seq: 20
    seq_length: 128
    use_next_sentence_label: true
    use_position_id: false
    use_v2_feature_names: false
trainer:
  checkpoint_interval: 5
  max_to_keep: 5
  optimizer_config:
    learning_rate:
      polynomial:
        cycle: false
        decay_steps: 1000000
        end_learning_rate: 0.0
        initial_learning_rate: 0.0001
        power: 1.0
      type: polynomial
    optimizer:
      type: adamw
    warmup:
      polynomial:
        power: 1
        warmup_steps: 10000
      type: polynomial
  steps_per_loop: 1
  summary_interval: 1
  train_steps: 200
  validation_interval: 5
  validation_steps: 64

这是train.py经过5个训练步骤后的输出:

2022-12-10 13:21:48.184678: W tensorflow/core/framework/dataset.cc:769] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
C:\Users\Iulian\AppData\Roaming\Python\Python39\site-packages\keras\engine\functional.py:637:
UserWarning: Input dict contained keys ['masked_lm_positions',
'masked_lm_ids', 'masked_lm_weights', 'next_sentence_labels']
which did not match any model input. They will be ignored by the model.
  inputs = self._flatten_to_reference_inputs(inputs)
WARNING:tensorflow:Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
W1210 13:21:52.408583 13512 utils.py:82] Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
W1210 13:21:58.768023 19348 utils.py:82] Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
train | step:      2 | steps/sec:    0.0 | output:
    {'learning_rate': 1.9799998e-08,
     'lm_example_loss': 10.961581,
     'masked_lm_accuracy': 0.0,
     'next_sentence_accuracy': 0.5625,
     'next_sentence_loss': 0.73979986,
     'training_loss': 11.701381}
train | step:      3 | steps/sec:    0.0 | output:
    {'learning_rate': 2.97e-08,
     'lm_example_loss': 10.981846,
     'masked_lm_accuracy': 0.0,
     'next_sentence_accuracy': 0.5,
     'next_sentence_loss': 0.75065744,
     'training_loss': 11.732503}
train | step:      4 | steps/sec:    0.0 | output:
    {'learning_rate': 3.9599996e-08,
     'lm_example_loss': 10.988701,
     'masked_lm_accuracy': 0.0,
     'next_sentence_accuracy': 0.5625,
     'next_sentence_loss': 0.69400764,
     'training_loss': 11.682709}
train | step:      5 | steps/sec:    0.0 | output:
    {'learning_rate': 4.9500002e-08,
     'lm_example_loss': 11.004994,
     'masked_lm_accuracy': 0.0,
     'next_sentence_accuracy': 0.75,
     'next_sentence_loss': 0.5528765,
     'training_loss': 11.557871}

我试着在源代码中查找masked_lm_accuracy的使用位置(我认为需要一个特殊的标志来使用它),我发现这个精度默认添加在模型的指标列表中:

def build_metrics(self, training=None):
    del training
    metrics = [
        tf.keras.metrics.SparseCategoricalAccuracy(name='masked_lm_accuracy'),
        tf.keras.metrics.Mean(name='lm_example_loss')
    ]
    # TODO(hongkuny): rethink how to manage metrics creation with heads.
    if self.task_config.train_data.use_next_sentence_label:
      metrics.append(
          tf.keras.metrics.SparseCategoricalAccuracy(
              name='next_sentence_accuracy'))
      metrics.append(tf.keras.metrics.Mean(name='next_sentence_loss'))
    return metrics

  def process_metrics(self, metrics, labels, model_outputs):
    with tf.name_scope('MaskedLMTask/process_metrics'):
      metrics = dict([(metric.name, metric) for metric in metrics])
      if 'masked_lm_accuracy' in metrics:
        metrics['masked_lm_accuracy'].update_state(
            labels['masked_lm_ids'], model_outputs['mlm_logits'],
            labels['masked_lm_weights'])
      if 'next_sentence_accuracy' in metrics:
        metrics['next_sentence_accuracy'].update_state(
            labels['next_sentence_labels'], model_outputs['next_sentence'])
wwwo4jvm

wwwo4jvm1#

在BERT预训练中,masked_lm_accuracy始终为零可能有几个原因。
1.**您的数据集可能不够大。**BERT是一个非常大的语言模型,它需要大量的数据来正确训练。如果数据集太小,模型可能无法学习单词及其上下文之间的关系。
1.**您的数据可能不干净。**BERT是一个非常敏感的模型,它很容易被数据中的噪声所欺骗。如果您的数据包含错误,例如拼写错误或语法错误,模型可能无法正确地从中学习。
1.**您的超参数可能设置不正确。**BERT有许多超参数会影响其性能。如果这些超参数设置不正确,模型可能无法有效学习。
如果你仍然无法提高masked_lm_accuracy,你可能需要尝试使用更大的数据集,清理数据或调整超参数。
以下是一些可以帮助您提高masked_lm_准确性的其他提示:

  • 使用不同的数据集。你的数据集越多样化,BERT就能更好地学习单词及其上下文之间的关系。
  • 清理数据。请确保您的数据没有错误,如拼写错误或语法错误。
  • 使用正确的超参数。BERT的超参数对其性能有很大的影响.请确保您为数据集和训练目标使用了正确的超参数。
  • 要有耐心。BERT可能需要很长时间来训练。如果您没有看到masked_lm_accuracy的改进,请耐心等待并继续训练模型。

相关问题