Paddle 对BERT使用对抗训练报错

68de4m5k  于 2021-11-30  发布在  Java
关注(0)|答案(6)|浏览(491)

为使您的问题得到快速解决,在建立Issues前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】

如果您没有查询到相似问题,为快速解决您的提问,建立issue时请提供如下细节信息:

  • 标题:简洁、精准概括您的问题,例如“Insufficient Memory xxx" ”
  • 版本、环境信息:

   1)PaddlePaddle版本:请提供您的PaddlePaddle版本号,例如1.1或CommitID
   2)CPU:预测若用CPU,请提供CPU型号,MKL/OpenBlas/MKLDNN/等数学库使用情况
   3)GPU:预测若用GPU,请提供GPU型号、CUDA和CUDNN版本号
   4)系统环境:请您描述系统类型、版本,例如Mac OS 10.14,Python版本
注:您可以通过执行summary_env.py获取以上信息。

  • 训练信息

   1)单机/多机,单卡/多卡
   2)显存信息
   3)Operator信息

  • 复现信息:如为报错,请给出复现环境、复现步骤
  • 问题描述:请详细描述您的问题,同步贴出报错信息、日志、可复现的代码片段

Thank you for contributing to PaddlePaddle.
Before submitting the issue, you could search issue in the github in case that there was a similar issue submitted or resolved before.
If there is no solution,please make sure that this is a training issue including the following details:

System information

-PaddlePaddle version (eg.1.1)or CommitID
-CPU: including CPUMKL/OpenBlas/MKLDNN version
-GPU: including CUDA/CUDNN version
-OS Platform (eg.Mac OS 10.14)
-Other imformation: Distriuted training/informantion of operator/
Graphics card storage
Note: You can get most of the information by running summary_env.py.

To Reproduce

Steps to reproduce the behavior

Describe your current behavior
Code to reproduce the issue
Other info / logs

class FGM():
    """针对embedding层梯度上升干扰的对抗训练方法,Fast Gradient Method(FGM)"""

    def __init__(self, model):
        self.model = model
        self.backup = {}

    def attack(self, epsilon=1., emb_name='emb'):
        # emb_name这个参数要换成你模型中embedding的参数名
        for name, param in self.model.named_parameters():
            if not param.stop_gradient and emb_name in name:  # 检验参数是否可训练及范围
                self.backup[name] = param.numpy()  # 备份原有参数值
                grad_tensor = paddle.to_tensor(param.grad)  # param.grad是个numpy对象
                norm = paddle.norm(grad_tensor)  # norm化
                if norm != 0:
                    r_at = epsilon * grad_tensor / norm
                    param.add(r_at)  # 在原有embed值上添加向上梯度干扰

    def restore(self, emb_name='emb'):
        # emb_name这个参数要换成你模型中embedding的参数名
        for name, param in self.model.named_parameters():
            if not param.stop_gradient and emb_name in name:
                assert name in self.backup
                param.set_value(self.backup[name])  # 将原有embed参数还原
        self.backup = {}

41zrol4v

41zrol4v1#

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

4si2a6ki

4si2a6ki2#

您好,报错信息需要复现代码,请提供代码和环境信息,我们复现后会尽快回复您~

krcsximq

krcsximq3#

from paddlenlp.datasets import load_dataset
from paddlenlp.transformers import BertForSequenceClassification, BertTokenizer
from paddlenlp.transformers import LinearDecayWithWarmup
import paddle
import time
from functools import partial
import pandas as pd
import numpy as np

def read(data_path):
    df=pd.read_csv(data_path)
    for idx,row in df.iterrows():
        words=row['text']
        labels=row['class']
        yield {'text': words, 'label': labels}

# data_path为read()方法的参数

train_ds = load_dataset(read, data_path='data/train_data_public.csv',lazy=False)

# 转换成id的函数

def convert_example(example, tokenizer):
    encoded_inputs = tokenizer(text=example["text"], max_seq_len=512, pad_to_max_seq_len=True)
    return tuple([np.array(x, dtype="int64") for x in [
            encoded_inputs["input_ids"], encoded_inputs["token_type_ids"], [example["label"]]]])

# 加载BERT的分词器

tokenizer = BertTokenizer.from_pretrained("bert-base-chinese")

# 把训练集合转换成id

train_ds = train_ds.map(partial(convert_example, tokenizer=tokenizer))

# 构建训练集合的dataloader

train_batch_sampler = paddle.io.BatchSampler(dataset=train_ds, batch_size=32, shuffle=True)
train_data_loader = paddle.io.DataLoader(dataset=train_ds, batch_sampler=train_batch_sampler, return_list=True)

num_classes=3
model = BertForSequenceClassification.from_pretrained("bert-base-chinese", num_classes=num_classes)

num_train_epochs=3
num_training_steps = len(train_data_loader) * num_train_epochs

# 定义 learning_rate_scheduler,负责在训练过程中对 lr 进行调度

lr_scheduler = LinearDecayWithWarmup(5E-5, num_training_steps, 0.0)

# Generate parameter names needed to perform weight decay.

# All bias and LayerNorm parameters are excluded.

decay_params = [
    p.name for n, p in model.named_parameters()
    if not any(nd in n for nd in ["bias", "norm"])
]

# 定义 Optimizer

optimizer = paddle.optimizer.AdamW(
    learning_rate=lr_scheduler,
    parameters=model.parameters(),
    weight_decay=0.0,
    apply_decay_param_fun=lambda x: x in decay_params)

# 交叉熵损失

criterion = paddle.nn.loss.CrossEntropyLoss()

# 评估的时候采用准确率指标

metric = paddle.metric.Accuracy()

class FGM():
    """针对embedding层梯度上升干扰的对抗训练方法,Fast Gradient Method(FGM)"""

    def __init__(self, model):
        self.model = model
        self.backup = {}

    def attack(self, epsilon=1., emb_name='emb'):
        # emb_name这个参数要换成你模型中embedding的参数名
        for name, param in self.model.named_parameters():
            if not param.stop_gradient and emb_name in name:  # 检验参数是否可训练及范围
                self.backup[name] = param.numpy()  # 备份原有参数值
                grad_tensor = paddle.to_tensor(param.grad)  # param.grad是个numpy对象
                norm = paddle.norm(grad_tensor)  # norm化
                if norm != 0:
                    r_at = epsilon * grad_tensor / norm
                    param.add(r_at)  # 在原有embed值上添加向上梯度干扰

    def restore(self, emb_name='emb'):
        # emb_name这个参数要换成你模型中embedding的参数名
        for name, param in self.model.named_parameters():
            if not param.stop_gradient and emb_name in name:
                assert name in self.backup
                param.set_value(self.backup[name])  # 将原有embed参数还原
        self.backup = {}

# 接下来,开始正式训练模型,训练时间较长,可注解掉这部分

def do_adversarial_train(model,train_data_loader):

    fgm = FGM(model)
    global_step = 0
    tic_train = time.time()

    for epoch in range(1, num_train_epochs + 1):
        for step, batch in enumerate(train_data_loader, start=1):

            input_ids, token_type_ids, labels = batch
            probs = model(input_ids=input_ids, token_type_ids=token_type_ids)
            loss = criterion(probs, labels)
            correct = metric.compute(probs, labels)
            metric.update(correct)
            acc = metric.accumulate()

            global_step += 1

            # 每间隔 100 step 输出训练指标
            if global_step % 100 == 0:
                print(
                    "global step %d, epoch: %d, batch: %d, loss: %.5f, accu: %.5f, speed: %.2f step/s"
                    % (global_step, epoch, step, loss, acc,
                        10 / (time.time() - tic_train)))
                tic_train = time.time()
            loss.backward()

            # 对抗训练
            fgm.attack() # 在embedding上添加对抗扰动
            loss_adv = model(input_ids=input_ids, token_type_ids=token_type_ids)
            loss_adv.backward() # 反向传播,并在正常的grad基础上,累加对抗训练的梯度
            fgm.restore() # 恢复embedding参数

            optimizer.step()
            lr_scheduler.step()
            optimizer.clear_grad()

            if global_step % save_steps == 0 or global_step == last_step:
                model_path=os.path.join(output_dir,"model_classfication_%d.pdparams" % global_step)
                paddle.save(model.state_dict(),model_path)

do_adversarial_train(model,train_data_loader)

aistudio的环境paddle-gpu 2.0.2

qqrboqgw

qqrboqgw4#

paddlehub 2.0.4
paddlenlp 2.0.7
paddlepaddle-gpu 2.1.2.post101

qxgroojn

qxgroojn5#

你好,aistudio 推荐使用paddle 2.2.0rc,该版本可以解决该问题;

eqoofvh9

eqoofvh96#

好的,刚试了下,现在可以了

相关问题