HuggingFace transformers中的默认“Trainer”类是否使用PyTorch或TensorFlow?

ws51t4hk  于 2023-03-30  发布在  其他
关注(0)|答案(2)|浏览(335)

问题

根据official documentation的说法,Trainer类“为大多数标准用例提供了PyTorch中功能完整训练的API”。
然而,当我尝试在实践中实际使用Trainer时,我得到了以下错误消息,似乎表明TensorFlow目前正在幕后使用。

tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

那么它是哪一个呢?HuggingFace transformers库使用PyTorch还是TensorFlow来实现Trainer的内部实现?是否可以切换到只使用PyTorch?我似乎无法在TrainingArguments中找到相关参数。

为什么我的脚本总是打印出TensorFlow相关的错误?Trainer不应该只使用PyTorch吗?

源码

from transformers import GPT2Tokenizer
from transformers import GPT2LMHeadModel
from transformers import TextDataset
from transformers import DataCollatorForLanguageModeling
from transformers import Trainer
from transformers import TrainingArguments

import torch

# Load the GPT-2 tokenizer and LM head model
tokenizer    = GPT2Tokenizer.from_pretrained('gpt2')
lmhead_model = GPT2LMHeadModel.from_pretrained('gpt2')

# Load the training dataset and divide blocksize
train_dataset = TextDataset(
    tokenizer=tokenizer,
    file_path='./datasets/tinyshakespeare.txt',
    block_size=64
)

# Create a data collator for preprocessing batches
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

# Defining the training arguments
training_args = TrainingArguments(
    output_dir='./models/tinyshakespeare', # output directory for checkpoints
    overwrite_output_dir=True,             # overwrite any existing content

    per_device_train_batch_size=4,         # sample batch size for training
    dataloader_num_workers=1,              # number of workers for dataloader
    max_steps=100,                         # maximum number of training steps
    save_steps=50,                         # after # steps checkpoints are saved
    save_total_limit=5,                    # maximum number of checkpoints to save

    prediction_loss_only=True,             # only compute loss during prediction
    learning_rate=3e-4,                    # learning rate
    fp16=False,                            # use 16-bit (mixed) precision

    optim='adamw_torch',                   # define the optimizer for training
    lr_scheduler_type='linear',            # define the learning rate scheduler

    logging_steps=5,                       # after # steps logs are printed
    report_to='none',                      # report to wandb, tensorboard, etc.
)

if __name__ == '__main__':
    torch.multiprocessing.freeze_support()

    trainer = Trainer(
        model=lmhead_model,
        args=training_args,
        data_collator=data_collator,
        train_dataset=train_dataset,
    )

    trainer.train()
hzbexzde

hzbexzde1#

我认为Hugging Face transformers库中的默认Trainer类构建在PyTorch之上。
当您创建Trainer类的示例时,它会在后台初始化PyTorch模型和优化器。然后在训练期间使用PyTorch执行向前和向后传递,并使用优化器更新模型的权重。
Hugging Face还为TensorFlow用户提供了一个TFTrainer类,这些用户希望使用与Trainer类相同的训练循环和实用程序,但使用TensorFlow而不是PyTorch作为后端。

kr98yfug

kr98yfug2#

这取决于模型的训练方式和加载方式。transformers上最流行的模型同时支持PyTorch和Tensorflow(有时也支持JAX)。

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import TFAutoModelForSeq2SeqLM

model_name = "google/flan-t5-large"

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# This would work if the model's backend is PyTorch.
print(type(next(model.parameters())))

tf_model = TFAutoModelForSeq2SeqLM.from_pretrained(model_name)

# The `model.parameters()` would not work for Tensorflow,
# instead you can try `.summary()`
tf_model.summary()

[out]:

<class 'torch.nn.parameter.Parameter'>

Model: "tft5_for_conditional_generation"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 shared (Embedding)          multiple                  32899072  
                                                                 
 encoder (TFT5MainLayer)     multiple                  341231104 
                                                                 
 decoder (TFT5MainLayer)     multiple                  441918976 
                                                                 
 lm_head (Dense)             multiple                  32899072  
                                                                 
=================================================================
Total params: 783,150,080
Trainable params: 783,150,080
Non-trainable params: 0
_________________________________________________________________

可能是这样的:

def which_backend(model):
  try:
    model.parameters()
    return 'torch'
  except:
    try:
      model.summary()
      return 'tensorflow'
    except:
      return 'I have no idea... Maybe JAX?'

Q:如果我使用Trainer,它就是PyTorch?

答:是的,很可能模型有PyTorch后端,训练循环(优化器,损失等)使用PyTorch。但Trainer()不是模型,它是 Package 器对象。

Q:如果我想在Tensorflow后端模型中使用Trainer,我应该使用TFTrainer吗?

transformers的最新版本中,TFTrainer对象已被弃用,请参阅https://github.com/huggingface/transformers/pull/12706
如果您使用的是带有Tensorflow后端的模型,建议您使用Keras的sklearn风格的.fit()训练。

问:为什么我的脚本总是打印出与TensorFlow相关的错误?Trainer不应该只使用PyTorch吗?

尝试检查您的transformers版本,很可能您使用的是过时的版本,该版本使用了一些已弃用的对象,例如TextDataset(请参阅如何解决“仅单个元素的整数Tensor可以转换为索引”错误,当创建Dataset以微调GPT2模型时?)
在后来的版本中,很可能是pip install transformers>=4.26.1,Trainer不应该激活TF警告,使用TFTrainer会引发警告,建议用户使用Keras。

相关问题