DCU BW1000 环境下 llama.cpp 推理 Qwen3-Coder-30B 实践与问题排查

本次实验基于 DCU BW1000 计算卡环境。当前环境镜像资源有限，但算力可用。目标是验证在该硬件上运行 Qwen3-Coder-30B 模型的可行性。

模型评估

先通过 llmfit 查看模型参数，确认资源需求。

llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

输出显示该模型为 MoE 架构，Active VRAM 占用较低，但显存不足以进行专家卸载，建议量化至 Q8_0。估算速度约为 17.2 tok/s。

llama.cpp 部署尝试

首先编译并安装 llama.cpp。

git clone https://gitcode.com/GitHub_Trending/ll/llama.cpp
cd llama.cpp cmake -B build cmake --build build --config Release
export PATH=/root/llama.cpp/build/bin:$PATH

或者直接使用 make install：

cd build make install

安装完成后直接运行时报错，提示找不到共享库：

llama-cli: error while loading shared libraries: libmtmd.so.0: cannot open shared object file: No such file or directory
llama-gguf: error while loading shared libraries: libggml-base.so.0: cannot open shared object file: No such file or directory

检查发现是环境变量未生效，重新设置 PATH 后问题解决。

接下来尝试加载模型：

llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

报错如下：

Loading model... |gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
...
Failed to load the model

分析日志，应该是模型文件路径或格式不匹配。目标模型在默认仓库中似乎不可用。

DCU BW1000 环境下 llama.cpp 推理 Qwen3-Coder-30B 实践与问题排查

模型评估

llama.cpp 部署尝试

更多推荐文章

相关免费在线工具

Transformers 方案尝试

依赖排查

总结

更多推荐文章

相关免费在线工具

DCU BW1000 环境下 llama.cpp 推理 Qwen3-Coder-30B 实践与问题排查

模型评估

llama.cpp 部署尝试

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

Transformers 方案尝试

依赖排查

总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具