我正在尝试使用官方tensorflow github repository在特定领域的数据集上从头开始训练BERT
我使用文档中的this部分来使脚本适应我的用例,但我遇到了一个问题。首先,我使用create_pretraining_data.py
脚本将.txt
文件处理为.tfrecord
。这里一切正常,但是当我运行train.py
脚本开始训练BERT模型时,next_sentence_accuracy
在一些步骤后增加,但masked_lm_accuracy
始终保持0。
这是提供给train.py
脚本的config.yaml
文件:
task:
init_checkpoint: ''
model:
cls_heads: [{activation: tanh, cls_token_idx: 0, dropout_rate: 0.1, inner_dim: 768, name: next_sentence, num_classes: 2}]
encoder:
type: bert
bert:
attention_dropout_rate: 0.1
dropout_rate: 0.1
hidden_activation: gelu
hidden_size: 768
initializer_range: 0.02
intermediate_size: 3072
max_position_embeddings: 512
num_attention_heads: 12
num_layers: 12
type_vocab_size: 2
vocab_size: 50000
train_data:
drop_remainder: true
global_batch_size: 32
input_path: 'test_clean_tfrecord/2014/*'
is_training: true
max_predictions_per_seq: 20
seq_length: 128
use_next_sentence_label: true
use_position_id: false
use_v2_feature_names: false
validation_data:
drop_remainder: false
global_batch_size: 32
input_path: 'test_clean_tfrecord/2014/*'
is_training: false
max_predictions_per_seq: 20
seq_length: 128
use_next_sentence_label: true
use_position_id: false
use_v2_feature_names: false
trainer:
checkpoint_interval: 5
max_to_keep: 5
optimizer_config:
learning_rate:
polynomial:
cycle: false
decay_steps: 1000000
end_learning_rate: 0.0
initial_learning_rate: 0.0001
power: 1.0
type: polynomial
optimizer:
type: adamw
warmup:
polynomial:
power: 1
warmup_steps: 10000
type: polynomial
steps_per_loop: 1
summary_interval: 1
train_steps: 200
validation_interval: 5
validation_steps: 64
这是train.py
经过5个训练步骤后的输出:
2022-12-10 13:21:48.184678: W tensorflow/core/framework/dataset.cc:769] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
C:\Users\Iulian\AppData\Roaming\Python\Python39\site-packages\keras\engine\functional.py:637:
UserWarning: Input dict contained keys ['masked_lm_positions',
'masked_lm_ids', 'masked_lm_weights', 'next_sentence_labels']
which did not match any model input. They will be ignored by the model.
inputs = self._flatten_to_reference_inputs(inputs)
WARNING:tensorflow:Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
W1210 13:21:52.408583 13512 utils.py:82] Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
W1210 13:21:58.768023 19348 utils.py:82] Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
train | step: 2 | steps/sec: 0.0 | output:
{'learning_rate': 1.9799998e-08,
'lm_example_loss': 10.961581,
'masked_lm_accuracy': 0.0,
'next_sentence_accuracy': 0.5625,
'next_sentence_loss': 0.73979986,
'training_loss': 11.701381}
train | step: 3 | steps/sec: 0.0 | output:
{'learning_rate': 2.97e-08,
'lm_example_loss': 10.981846,
'masked_lm_accuracy': 0.0,
'next_sentence_accuracy': 0.5,
'next_sentence_loss': 0.75065744,
'training_loss': 11.732503}
train | step: 4 | steps/sec: 0.0 | output:
{'learning_rate': 3.9599996e-08,
'lm_example_loss': 10.988701,
'masked_lm_accuracy': 0.0,
'next_sentence_accuracy': 0.5625,
'next_sentence_loss': 0.69400764,
'training_loss': 11.682709}
train | step: 5 | steps/sec: 0.0 | output:
{'learning_rate': 4.9500002e-08,
'lm_example_loss': 11.004994,
'masked_lm_accuracy': 0.0,
'next_sentence_accuracy': 0.75,
'next_sentence_loss': 0.5528765,
'training_loss': 11.557871}
我试着在源代码中查找masked_lm_accuracy
的使用位置(我认为需要一个特殊的标志来使用它),我发现这个精度默认添加在模型的指标列表中:
def build_metrics(self, training=None):
del training
metrics = [
tf.keras.metrics.SparseCategoricalAccuracy(name='masked_lm_accuracy'),
tf.keras.metrics.Mean(name='lm_example_loss')
]
# TODO(hongkuny): rethink how to manage metrics creation with heads.
if self.task_config.train_data.use_next_sentence_label:
metrics.append(
tf.keras.metrics.SparseCategoricalAccuracy(
name='next_sentence_accuracy'))
metrics.append(tf.keras.metrics.Mean(name='next_sentence_loss'))
return metrics
def process_metrics(self, metrics, labels, model_outputs):
with tf.name_scope('MaskedLMTask/process_metrics'):
metrics = dict([(metric.name, metric) for metric in metrics])
if 'masked_lm_accuracy' in metrics:
metrics['masked_lm_accuracy'].update_state(
labels['masked_lm_ids'], model_outputs['mlm_logits'],
labels['masked_lm_weights'])
if 'next_sentence_accuracy' in metrics:
metrics['next_sentence_accuracy'].update_state(
labels['next_sentence_labels'], model_outputs['next_sentence'])
1条答案
按热度按时间wwwo4jvm1#
在BERT预训练中,masked_lm_accuracy始终为零可能有几个原因。
1.**您的数据集可能不够大。**BERT是一个非常大的语言模型,它需要大量的数据来正确训练。如果数据集太小,模型可能无法学习单词及其上下文之间的关系。
1.**您的数据可能不干净。**BERT是一个非常敏感的模型,它很容易被数据中的噪声所欺骗。如果您的数据包含错误,例如拼写错误或语法错误,模型可能无法正确地从中学习。
1.**您的超参数可能设置不正确。**BERT有许多超参数会影响其性能。如果这些超参数设置不正确,模型可能无法有效学习。
如果你仍然无法提高masked_lm_accuracy,你可能需要尝试使用更大的数据集,清理数据或调整超参数。
以下是一些可以帮助您提高masked_lm_准确性的其他提示: