text-generation-inference 使用deepseek-ai/deepseek-coder-6.7b-base生成的无意义文本

btqmn9zl  于 4个月前  发布在  其他
关注(0)|答案(5)|浏览(78)

系统信息

TGI版本:尝试了2.0.3、2.0.4、2.1.1,都无法正常工作,但2.0.2可以正常使用。

信息

  • Docker
  • 直接使用CLI

任务

  • 一个官方支持的命令
  • 自己的修改

重现步骤

def main():
    client = Client(base_url="http://127.0.0.1:8080")
    response = client.generate(
        prompt="def hel",
    )
    print(response.generated_text)

if __name__ == "__main__":
    main()

输出结果

metryryryryryryryryryryryryryryryryryryry

预期行为

应该是helloworld或者其他更有意义的内容

kwvwclae

kwvwclae1#

在这里找到类似的问题:#1957,看起来2.0.2之后的版本都无法正常工作。我尝试了所有2.0.2之后的版本,都失败了。

hfyxw5xn

hfyxw5xn2#

你好!
感谢报告问题 👍,你能分享一下如何复现这个问题吗?
例如,你使用的是哪个模型,启动docker容器的命令是什么?

bxjv4tth

bxjv4tth3#

你好,ErikKaum。
感谢你的回复。这个模型是

deepseek-ai/deepseek-coder-6.7b-base

,你可以使用官方的Docker容器命令来复现这个

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:2.1.1 --model-id $model

pn9klfpd

pn9klfpd4#

谢谢。
是的,我能够在我的机器上复现这个问题。同时快速检查似乎 deepseek-ai/deepseek-coder-6.7b-basetransformers 库一起工作。所以很可能是我们这边的问题。
目前,我很遗憾没有足够的带宽开始调试。
我看到很多这样的警告:

2024-07-16T15:00:17.309591Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'õ' was expected to have ID '32000' but was given ID 'None'
2024-07-16T15:00:17.309615Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '÷' was expected to have ID '32001' but was given ID 'None'
2024-07-16T15:00:17.309618Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'Á' was expected to have ID '32002' but was given ID 'None'
2024-07-16T15:00:17.309621Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ý' was expected to have ID '32003' but was given ID 'None'
2024-07-16T15:00:17.309624Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'À' was expected to have ID '32004' but was given ID 'None'
2024-07-16T15:00:17.309626Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ÿ' was expected to have ID '32005' but was given ID 'None'
2024-07-16T15:00:17.309629Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ø' was expected to have ID '32006' but was given ID 'None'
2024-07-16T15:00:17.309631Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ú' was expected to have ID '32007' but was given ID 'None'
2024-07-16T15:00:17.309641Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'þ' was expected to have ID '32008' but was given ID 'None'
2024-07-16T15:00:17.309643Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ü' was expected to have ID '32009' but was given ID 'None'
2024-07-16T15:00:17.309646Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ù' was expected to have ID '32010' but was given ID 'None'
2024-07-16T15:00:17.309648Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ö' was expected to have ID '32011' but was given ID 'None'
2024-07-16T15:00:17.309651Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'û' was expected to have ID '32012' but was given ID 'None'
2024-07-16T15:00:17.309653Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|begin▁of▁sentence|>' was expected to have ID '32013' but was given ID 'None'
2024-07-16T15:00:17.309656Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|end▁of▁sentence|>' was expected to have ID '32014' but was given ID 'None'
2024-07-16T15:00:17.309658Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|fim▁hole|>' was expected to have ID '32015' but was given ID 'None'
2024-07-16T15:00:17.309661Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|fim▁begin|>' was expected to have ID '32016' but was given ID 'None'
2024-07-16T15:00:17.309664Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|fim▁end|>' was expected to have ID '32017' but was given ID 'None'
2024-07-16T15:00:17.309666Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<pad>' was expected to have ID '32018' but was given ID 'None'
2024-07-16T15:00:17.309669Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|User|>' was expected to have ID '32019' but was given ID 'None'
2024-07-16T15:00:17.309672Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|Assistant|>' was expected to have ID '32020' but was given ID 'None'
2024-07-16T15:00:17.309674Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|EOT|>' was expected to have ID '32021' but was given ID 'None'
2024-07-16T15:00:17.310063Z  INFO text_generation_router: router/src/main.rs:330: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205

这让我觉得可能是一个分词问题。

ymzxtsji

ymzxtsji5#

如何解决这个问题?

相关问题