python 如何在Huggingface模型中用另一层相同形状的Tensor替换PyTorch模型层的Tensor?

izj3ouym  于 2023-01-24  发布在  Python
关注(0)|答案(2)|浏览(200)

给定Huggingface模型,例如

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-large-uncased", num_labels=2)

我可以这样访问层的Tensor:

# Shape [1024, 1024]
model.state_dict()["bert.encoder.layer.0.attention.self.query.weight"]

[out]:

tensor([[ 0.0167, -0.0422, -0.0425,  ...,  0.0302, -0.0341,  0.0251],
        [ 0.0323,  0.0347, -0.0041,  ..., -0.0722,  0.0031, -0.0351],
        [ 0.0387, -0.0293, -0.0694,  ...,  0.0492,  0.0201, -0.0727],
        ...,
        [ 0.0035,  0.0081, -0.0337,  ...,  0.0460,  0.0268,  0.0747],
        [ 0.0513,  0.0131,  0.0735,  ..., -0.0127,  0.0144, -0.0400],
        [ 0.0385,  0.0013, -0.0272,  ...,  0.0148,  0.0399,  0.0339]])

给定另一个相同形状的Tensor,我已经预先定义好了,在这个例子中,为了说明,我创建了一个随机Tensor,但它可以是任何预先定义的Tensor。

import torch
replacement_layer = torch.rand([1024, 1024])

注意:我不是要用随机Tensor替换层,而是要用预定义的Tensor替换层。

当我尝试通过state_dict()替换层Tensor时,似乎不起作用:

import torch
from transformers import AutoModelForSequenceClassification

# The model with a layer that we want to replace.
model = AutoModelForSequenceClassification.from_pretrained("bert-large-uncased", num_labels=2)

# A replacement layer.
replacement_layer = torch.rand([1024, 1024])

# Replacing the layer in the statedict.
model.state_dict()["bert.encoder.layer.0.attention.self.query.weight"] = replacement_layer

# Check that the layer is replaced. No, it is not =(
assert torch.equal(
    model.state_dict()["bert.encoder.layer.0.attention.self.query.weight"], 
    replacement_layer)

如何在Huggingface模型中将PyTorch模型层的Tensor替换为另一个相同形状的层?

pxyaymoc

pxyaymoc1#

state_dict是一个特殊的东西,它是一个动态的副本,而不是模型的实际内容,如果这是有意义的话。
您可以通过点符号直接访问模型的层。请注意,0通常表示索引而不是字符串。您还需要将Tensor转换为torch参数,以便在模型中工作。
所以这应该行得通:

model.bert.encoder.layer[0].attention.self.query.weight = torch.nn.Parameter(replacement_layer)

或全文:

# Note I used the base model for testing
import torch
from transformers import AutoModelForSequenceClassification

# The model with a layer that we want to replace.
model: torch.nn.Module = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

# A replacement layer.
replacement_layer = torch.rand([768, 768])

model.bert.encoder.layer[0].attention.self.query.weight = torch.nn.Parameter(replacement_layer)

# Check that the layer is replaced
assert torch.equal(
    model.state_dict()["bert.encoder.layer.0.attention.self.query.weight"],
    replacement_layer)

assert torch.equal(
    model.bert.encoder.layer[0].attention.self.query.weight,
    replacement_layer)

print("Succes!")
waxmsbnn

waxmsbnn2#

更新state_dict(一个有序dict)的副本,然后从更新后的state_dict重新创建模型。

import torch
    from transformers import AutoModelForSequenceClassification

    # The model with a layer that we want to replace.
    model = AutoModelForSequenceClassification.from_pretrained("bert-large-uncased", num_labels=2)

    # A replacement layer.
    replacement_layer = torch.rand([1024, 1024])

    # get a copy of the state_dict
    state_dict_copy = model.state_dict().copy()

    # replace the specific layer with new data of the same shape as the shape of the old layer
    state_dict_copy["bert.encoder.layer.0.attention.self.query.weight"] = replacement_layer

    # re-create the model
    model.load_state_dict(state_dict_copy)

    # Check that the layer is replaced. No, it is not =(
    assert torch.equal( model.state_dict()["bert.encoder.layer.0.attention.self.query.weight"], replacement_layer)

相关问题