tensorflow 如何获得准确性每历元或步为huggingface.transformers教练?

xwbd5t1u  于 2023-02-16  发布在  其他
关注(0)|答案(5)|浏览(123)

我使用的是带有BertForSequenceClassification.from_pretrained(“bert-base-uncased”)模型的huggingface训练器。
简化后,它看起来像这样:

model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

training_args = TrainingArguments(
        output_dir="bert_results",
        num_train_epochs=3,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=32,
        warmup_steps=500,
        weight_decay=0.01,
        logging_dir="bert_results/logs",
        logging_steps=10
        )

trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=compute_metrics
        )

日志包含每10步的损失,但我似乎找不到训练精度。有人知道如何获得精度吗,例如,通过改变日志的详细程度?我似乎找不到任何关于它的在线。
谢谢,巴斯警官

ruarlubt

ruarlubt1#

您可以加载accuracy metric并使其与您的compute_metrics函数一起工作。例如,它如下所示:

from datasets import load_metric
metric = load_metric('accuracy')

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

这个compute_metrics函数的例子是基于Hugging Face's text classification tutorial的,它在我的测试中工作正常。

ztmd8pv5

ztmd8pv52#

我遇到了同样的问题,我通过添加一个自定义回调函数来解决这个问题,该函数在每次回调函数结束时使用train_dataset调用evaluate()方法。

class CustomCallback(TrainerCallback):
    
    def __init__(self, trainer) -> None:
        super().__init__()
        self._trainer = trainer
    
    def on_epoch_end(self, args, state, control, **kwargs):
        if control.should_evaluate:
            control_copy = deepcopy(control)
            self._trainer.evaluate(eval_dataset=self._trainer.train_dataset, metric_key_prefix="train")
            return control_copy

trainer = Trainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=valid_dataset,          # evaluation dataset
    compute_metrics=compute_metrics,     # the callback that computes metrics of interest
    tokenizer=tokenizer
)
trainer.add_callback(CustomCallback(trainer)) 
train = trainer.train()

这给出了如下所示的列车指标:

{'train_loss': 0.7159061431884766, 'train_accuracy': 0.4, 'train_f1': 0.5714285714285715, 'train_runtime': 6.2973, 'train_samples_per_second': 2.382, 'train_steps_per_second': 0.159, 'epoch': 1.0}
{'eval_loss': 0.8529007434844971, 'eval_accuracy': 0.0, 'eval_f1': 0.0, 'eval_runtime': 2.0739, 'eval_samples_per_second': 0.964, 'eval_steps_per_second': 0.482, 'epoch': 1.0}

另一种获得训练精度的方法是扩展基本Trainer类并覆盖compute_loss()方法,如下所示:

class CustomTrainer(Trainer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        
    def compute_loss(self, model, inputs, return_outputs=False):
        """
        How the loss is computed by Trainer. By default, all models return the loss in the first element.
        Subclass and override for custom behavior.
        """
        if self.label_smoother is not None and "labels" in inputs:
            labels = inputs.pop("labels")
        else:
            labels = None
        outputs = model(**inputs)

        # code for calculating accuracy
        if "labels" in inputs:
            preds = outputs.logits.detach()
            acc1 = accuracy_score(inputs.labels.reshape(1, len(inputs.labels))[0], preds.argmax(axis=1))
            self.log({'accuracy_score': acc1})
            acc = (
                (preds.argmax(axis=-1) == inputs.labels.reshape(1, len(inputs.labels))[0])
                .type(torch.float)
                .mean()
                .item()
            )
            self.log({"train_accuracy": acc})
        # end code for calculating accuracy
                    
        # Save past state if it exists
        # TODO: this needs to be fixed and made cleaner later.
        if self.args.past_index >= 0:
            self._past = outputs[self.args.past_index]

        if labels is not None:
            loss = self.label_smoother(outputs, labels)
        else:
            # We don't use .loss here since the model may return tuples instead of ModelOutput.
            loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]

        return (loss, outputs) if return_outputs else loss

然后使用CustomTrainer代替培训师,如下所示:

trainer = CustomTrainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=valid_dataset,          # evaluation dataset
    compute_metrics=compute_metrics,     # the callback that computes metrics of interest
    tokenizer=tokenizer
)
a1o7rhls

a1o7rhls3#

您可以通过evaluation_strategy training参数来确定培训师的评估间隔,该参数目前接受3个值:
“否”:培训期间不进行评估。
“steps”:每个eval_steps执行一次评估(并记录)。
“epoch”:在每个epoch结束时进行评价。

5jvtdoz2

5jvtdoz24#

函数来返回所需的指标。下面是我编写的一个函数,它返回指标列表(越多越好,对吗?):

def compute_metrics(eval_pred):
    metrics = ["accuracy", "recall", "precision", "f1"] #List of metrics to return
    metric={}
    for met in metrics:
       metric[met] = load_metric(met)
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    metric_res={}
    for met in metrics:
       metric_res[met]=metric[met].compute(predictions=predictions, references=labels)[met]
    return metric_res

此外,如果需要计算每个时期的指标,则需要在训练参数中定义:

training_args = TrainingArguments(
    ...,
    evaluation_strategy = "epoch", #To calculate metrics per epoch
    logging_strategy="epoch", #Extra: to log training data stats for loss 
)

最后一步是将其添加到培训师:

trainer = Trainer(
    ...,
    compute_metrics=compute_metrics,
)
57hvy0tb

57hvy0tb5#

现在说这个有点晚了,但是为了方便那些没有成功回答前面问题的人,我发现了另一个方法,那就是覆盖Transformers库中Trainer类的evaluate方法。这个方法的思想是计算训练集的求值,并将它们添加到日志中。确保在返回时将eval和train字典合并为一个字典。
按如下所示扩展培训器类并覆盖:

class CTCTrainer(Trainer):
        def evaluate(
            self,
            eval_dataset: Optional[Dataset] = None,
            ignore_keys: Optional[List[str]] = None,
            metric_key_prefix: str = "eval",
        ) -> Dict[str, float]:
            """
            Run evaluation and returns metrics.
            The calling script will be responsible for providing a method to compute metrics, as they are task-dependent
            (pass it to the init `compute_metrics` argument).
            You can also subclass and override this method to inject custom behavior.
            Args:
                eval_dataset (`Dataset`, *optional*):
                    Pass a dataset if you wish to override `self.eval_dataset`. If it is a [`~datasets.Dataset`], columns
                    not accepted by the `model.forward()` method are automatically removed. It must implement the `__len__`
                    method.
                ignore_keys (`Lst[str]`, *optional*):
                    A list of keys in the output of your model (if it is a dictionary) that should be ignored when
                    gathering predictions.
                metric_key_prefix (`str`, *optional*, defaults to `"eval"`):
                    An optional prefix to be used as the metrics key prefix. For example the metrics "bleu" will be named
                    "eval_bleu" if the prefix is "eval" (default)
            Returns:
                A dictionary containing the evaluation loss and the potential metrics computed from the predictions. The
                dictionary also contains the epoch number which comes from the training state.
            """
            # memory metrics - must set up as early as possible
            self._memory_tracker.start()
    
            eval_dataloader = self.get_eval_dataloader(eval_dataset)
            train_dataloader = self.get_train_dataloader()
            start_time = time.time()
    
            eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
            eval_output = eval_loop(
                eval_dataloader,
                description="Evaluation",
                # No point gathering the predictions if there are no metrics, otherwise we defer to
                # self.args.prediction_loss_only
                prediction_loss_only=True if self.compute_metrics is None else None,
                ignore_keys=ignore_keys,
                metric_key_prefix=metric_key_prefix,
            )
    
            train_output = eval_loop(
                train_dataloader,
                description='Training Evaluation',
                prediction_loss_only=True if self.compute_metrics is None else None,
                ignore_keys=ignore_keys,
                metric_key_prefix="train",
            )
    
            total_batch_size = self.args.eval_batch_size * self.args.world_size
            if f"{metric_key_prefix}_jit_compilation_time" in eval_output.metrics:
                start_time += eval_output.metrics[f"{metric_key_prefix}_jit_compilation_time"]
            eval_output.metrics.update(
                speed_metrics(
                    metric_key_prefix,
                    start_time,
                    num_samples=eval_output.num_samples,
                    num_steps=math.ceil(eval_output.num_samples / total_batch_size),
                )
            )
    
            train_n_samples = len(self.train_dataset)
            train_output.metrics.update(speed_metrics('train', start_time, train_n_samples))
            self.log(train_output.metrics | eval_output.metrics)
    
            if DebugOption.TPU_METRICS_DEBUG in self.args.debug:
                # tpu-comment: Logging debug metrics for PyTorch/XLA (compile, execute times, ops, etc.)
                xm.master_print(met.metrics_report())
    
            self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, train_output.metrics)
            self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, eval_output.metrics)
    
            self._memory_tracker.stop_and_update_metrics(eval_output.metrics)
            self._memory_tracker.stop_and_update_metrics(train_output.metrics)
    
            # only works in Python >= 3.9
            return train_output.metrics | eval_output.metrics

记住使用你的定制扩展类训练你的模型,trainer = CTCTrainer(args)and trainer.train().上面的代码将在你的日志历史中产生以下输出。

"log_history": [
    {
      "epoch": 0.67,
      "learning_rate": 6.428571428571429e-05,
      "loss": 2.1279,
      "step": 5
    },
    {
      "epoch": 0.67,
      "eval_accuracy": 0.13333334028720856,
      "eval_loss": 2.1077311038970947,
      "eval_runtime": 10.683,
      "eval_samples_per_second": 5.616,
      "eval_steps_per_second": 1.404,
      "step": 5,
      "train_accuracy": 0.13333334028720856,
      "train_loss": 2.086669921875,
      "train_runtime": 10.683,
      "train_samples_per_second": 5.616
    }
}

相关问题