问题:什么问题?
#### What is the issue?
ollama-1 | time=2024-08-20T02:46:33.204Z level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=25 layers.offload=0 layers.split="" memory.available="[22.2 GiB]" memory.required.full="820.5 MiB" memory.required.partial="0 B" memory.required.kv="48.0 MiB" memory.required.allocations="[820.5 MiB]" memory.weights.total="625.2 MiB" memory.weights.repeating="584.0 MiB" memory.weights.nonrepeating="41.3 MiB" memory.graph.full="128.0 MiB" memory.graph.partial="128.0 MiB"
ollama-1 | time=2024-08-20T02:46:33.206Z level=INFO source=server.go:393 msg="starting llama server" cmd="/tmp/ollama1960294902/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-85df6dbe02a3bfb67f24400c4d56ba8bd1a8a19a14450761b65ce17fe1d5064a --ctx-size 8192 --batch-size 512 --embedding --log-disable --no-mmap --parallel 4 --port 46451"
ollama-1 | time=2024-08-20T02:46:33.207Z level=INFO source=sched.go:445 msg="loaded runners" count=1
ollama-1 | time=2024-08-20T02:46:33.207Z level=INFO source=server.go:593 msg="waiting for llama runner to start responding"
ollama-1 | time=2024-08-20T02:46:33.207Z level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server error"
ollama-1 | INFO [main] build info | build=1 commit="1e6f655" tid="127020122728320" timestamp=1724121993
ollama-1 | INFO [main] system info | n_threads=16 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 " | tid="127020122728320" timestamp=1724121993 total_threads=32
ollama-1 | INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="31" port="46451" tid="127020122728320" timestamp=1724121993
在这段文本中,我们可以看到一些关于 OLAMA(一个用于大规模预训练语言模型的高性能分布式推理框架)的信息。以下是翻译后的文本:
ollama-1 | llm_load_print_meta: n_layer = 24
ollama-1 | llm_load_print_meta: n_head = 16
ollama-1 | llm_load_print_meta: n_head_kv = 16
ollama-1 | llm_load_print_meta: n_rot = 64
ollama-1 | llm_load_print_meta: n_swa = 0
ollama-1 | llm_load_print_meta: n_embd_head_k = 64
ollama-1 | llm_load_print_meta: n_embd_head_v = 64
ollama-1 | llm_load_print_meta: n_gqa = 1
ollama-1 | llm_load_print_meta: n_embd_k_gqa = 1024
ollama-1 | llm_load_print_meta: n_embd_v_gqa = 1024
ollama-1 | llm_load_print_meta: f_norm_eps = 1.0e-12
ollama-1 | llm_load_print_meta: f_norm_rms_eps = 0.0e+00
ollama-1 | llm_load_print_meta: f_clamp_kqv = 0.0e+00
ollama-1 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00
ollama-1 | llm_load_print_meta: f_logit_scale = 0.0e+00
ollama-1 | llm_load_print_meta: n_ff = 4096
ollama-1 | llm_load
7条答案
按热度按时间bqjvbblv1#
你能提供你获取模型的链接吗?
nr7wwzry2#
你能提供你获取模型的链接吗?
https://ollama.com/search?q=xiaobu
我尝试了这两个模型,它们都报告了相同的错误。
7vux5j2d3#
感谢您提供的链接。我能够复现这个问题,我会及时通知您。
0ve6wy6x4#
对于我来说,llama3.1在docker上也是一样的。
00jrzges5#
经过一些调查,这似乎是一个特定于这个模型(小布嵌入v2)的问题。出于某种原因,llama.cpp在大约50%的时间里访问
inp_embd
数据时发生段错误。不确定根本原因是什么。Tensor似乎已正确初始化。你可能会在llama.cpp GitHub上找到一些解决方法
nqwrtyyt6#
After some investigation, it seems to be an issue specific to this model (xiaobu embedding v2). For some reason, llama.cpp segfaults accessing
inp_embd
data around 50% of the time. Not sure what the root cause is. The tensor seems to be initialized correctlyYou might have some luck cross posting this to the llama.cpp GitHub
Thanks for the reply, I'll try to submit this to the llama.cpp GitHub.
n3schb8v7#
经过一些调查,这似乎是一个特定于这个模型(小布嵌入v2)的问题。出于某种原因,llama.cpp在大约50%的时间里访问
inp_embd
数据时发生段错误。不确定根本原因是什么。Tensor似乎已正确初始化。你可能会在llama.cpp GitHub上找到一些解决方法。
安装了NVIDIA GPU后,它停止报告错误。