DCU BW1000 环境下 llama.cpp 推理 Qwen3-Coder 模型问题排查

综述由AI生成记录了在 DCU BW1000 计算卡上尝试使用 llama.cpp 和 Transformers 框架推理 Qwen3-Coder-30B 模型的过程。主要遇到以下问题：编译 llama.cpp 时缺少共享库依赖；模型下载路径与加载路径不一致；Transformers 加载 AWQ 量化模型时报错，提示需要安装 gptqmodel，但该包在当前环境中无法通过 pip 或 conda 安装。最终因环境依赖冲突及模型适配问题，推理未能成功，相关调试记录供参考。

moshang发布于 2026/4/6更新于 2026/5/2725 浏览

在 DCU BW1000 计算卡上尝试使用 llama.cpp 和 Transformers 框架推理 Qwen3-Coder-30B 模型，记录遇到的问题及排查过程。

模型信息分析

使用 llmfit 查看模型情况：

llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

输出摘要：

Provider: stelterlab
Parameters: 4.6B (MoE)
Quantization: Q4_K_M / Q8_0
Context Length: 262144 tokens
Runtime: llama.cpp (est. ~17.2 tok/s)
Fit Analysis: CPU+GPU, Memory Utilization 0.6%

安装 llama.cpp

克隆源码并编译：

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp cmake -B build cmake --build build --config Release

添加环境变量：

export PATH=/root/llama.cpp/build/bin:$PATH

若直接运行报错 libmtmd.so.0 或 libggml-base.so.0 缺失，需确认路径是否正确加入。修正后问题解决。

模型下载

安装 modelscope 库：

pip install modelscope

下载模型：

from modelscope import snapshot_download
snapshot_download('tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ', cache_dir="models")

推理测试

1. 使用 llama-cli

llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

报错：

Loading model... |gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: ... failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
Failed to load the model

经检查，目标模型应为 stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ，但公共仓库中未找到该版本，导致路径不匹配。

2. 使用 transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "/root/models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype="auto", device_map="auto"
)
prompt = "Write a quick sort algorithm."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=65536)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)

执行亦失败，报错如下：

ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

依赖调试

尝试安装 gptqmodel：

pip install gptqmodel

报错：

Exception: Unable to detect torch version via uv/pip/conda/importlib...
ERROR: Failed to build 'gptqmodel' when getting requirements to build wheel

尝试 conda 安装：

conda install gptqmodel

报错：

PackagesNotFoundError: The following packages are not available from current channels:
 - gptqmodel

总结

当前环境配置未解决，暂时搁置。

llama.cpp 因模型仓库路径不匹配无法加载。
Transformers 因 AWQ 量化依赖库 gptqmodel 在当前环境中无法安装（torch 版本或通道限制），导致推理失败。

DCU BW1000 环境下 llama.cpp 推理 Qwen3-Coder 模型问题排查

moshang发布于 2026/4/6更新于 2026/5/2725 浏览

Loading model... |gguf_init_from_file_impl: failed to read magic llama_model_load: error loading model: ... failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ Failed to load the model

from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "/root/models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) prompt = "Write a quick sort algorithm." messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate(**model_inputs, max_new_tokens=65536) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() content = tokenizer.decode(output_ids, skip_special_tokens=True) print("content:", content)

DCU BW1000 环境下 llama.cpp 推理 Qwen3-Coder 模型问题排查

模型信息分析

安装 llama.cpp

模型下载

推理测试

1. 使用 llama-cli

2. 使用 transformers

依赖调试

总结

DCU BW1000 环境下 llama.cpp 推理 Qwen3-Coder 模型问题排查

模型信息分析

安装 llama.cpp

模型下载

推理测试

1. 使用 llama-cli

2. 使用 transformers

依赖调试

总结

更多推荐文章

相关免费在线工具

更多推荐文章

相关免费在线工具

DCU BW1000 环境下 llama.cpp 推理 Qwen3-Coder 模型问题排查

模型信息分析

安装 llama.cpp

模型下载

推理测试

1. 使用 llama-cli

2. 使用 transformers

依赖调试

总结

DCU BW1000 环境下 llama.cpp 推理 Qwen3-Coder 模型问题排查

模型信息分析

安装 llama.cpp

模型下载

推理测试

1. 使用 llama-cli

2. 使用 transformers

依赖调试

总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具