AutoGPTQ 无法完成因式分解，因为输入不是正定的,

y4ekin9u 于 6个月前发布在其他

关注(0)|答案(8)|浏览(68)

描述bug

对bug进行清晰简洁的描述。
model.quantize(examples, use_triton=False)
获取错误
torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 16383 is not positive-definite).

硬件详情

关于CPU和GPU的信息，例如RAM数量等。
GPU A40

软件版本

相关软件的版本，如操作系统、cuda工具包、python、auto-gptq、pytorch、transformers、accelerate等。

AutoGPTQ

来源：https://github.com/AutoGPTQ/AutoGPTQ/issues/98

8条答案

按热度按时间

vkc1a9a21#

请展示您正在运行的完整代码，以便我们可以看到什么模型/模型类型等。

赞(0）回复(0）举报 6个月前

yhived7q2#

from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

pretrained_model_dir = "/home/models/Phoenix/"
quantized_model_dir = "/home/models/Phoenix_int8/"
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
examples = [
    tokenizer("你叫啥名字")
]
quantize_config = BaseQuantizeConfig(
    bits=8, # 量化模型为4位
    group_size=128, # 建议设置值为128
    desc_act=False, # 设置为False可以显著加速推理，但困惑度可能会略有下降
)
model = AutoGPTQForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)
model.quantize(examples, use_triton=False)
model.save_quantized(quantized_model_dir)
model.save_quantized(quantized_model_dir, use_safetensors=True)
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0", use_triton=False)
print(tokenizer.decode(model.generate(**tokenizer("auto_gptq is", return_tensors="pt").to("cuda:0"))[0]))
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)
print(pipeline("auto-gptq is")[0]["generated_text"])

赞(0）回复(0）举报 6个月前

h5qlskok3#

在这里，使用LLAMA-30B的示例代码

赞(0）回复(0）举报 6个月前

64jmpszr4#

我在torch2.0.1上运行了代码。

赞(0）回复(0）举报 6个月前

bf1o4zei5#

相同的问题

赞(0）回复(0）举报 6个月前

eagi6jfj6#

我再次尝试了一下，这次成功了。

from transformers import AutoTokenizer, TextGenerationPipeline
from transformers import LlamaForCausalLM, LlamaTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import logging

logging.basicConfig(
    format="%(asctime)s %(levelname)s [%(name)s] %(message)s", level=logging.INFO, datefmt="%Y-%m-%d %H:%M:%S"
)
pretrained_model_dir = "belle13bzh_v1"
quantized_model_dir = "belle13bzh_v1_4bit"
tokenizer = LlamaTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
examples = [
    tokenizer("auto-gptq是一个基于GPTQ算法的易于使用的模型量化库，具有友好的api。")
]
quantize_config = BaseQuantizeConfig(
    bits=4,  # 量化模型为4位
    group_size=128,  # 建议将值设置为128
    desc_act=False,  # 设置为False可以显著加速推理，但困惑度可能会略差
)

# 加载未量化的模型，默认情况下，模型将始终加载到CPU内存中

model = AutoGPTQForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)

# 量化模型，示例应该是字典列表，其键只能是"input_ids"和"attention_mask"

model.quantize(examples)

# 保存量化模型

model.save_quantized(quantized_model_dir)

# 使用safetensors保存量化模型

model.save_quantized(quantized_model_dir, use_safetensors=False)

赞(0）回复(0）举报 6个月前

e1xvtsh37#

为什么可以了，是换机器还是环境？

赞(0）回复(0）举报 6个月前

weylhg0b8#