在 DCU BW1000 计算卡上尝试使用 llama.cpp 和 Transformers 框架推理 Qwen3-Coder-30B 模型,记录遇到的问题及排查过程。
模型信息分析
使用 llmfit 查看模型情况:
llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ
输出摘要:
- Provider: stelterlab
- Parameters: 4.6B (MoE)
- Quantization: Q4_K_M / Q8_0
- Context Length: 262144 tokens
- Runtime: llama.cpp (est. ~17.2 tok/s)
- Fit Analysis: CPU+GPU, Memory Utilization 0.6%
安装 llama.cpp
克隆源码并编译:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp cmake -B build cmake --build build --config Release
添加环境变量:
export PATH=/root/llama.cpp/build/bin:$PATH
若直接运行报错 libmtmd.so.0 或 libggml-base.so.0 缺失,需确认路径是否正确加入。修正后问题解决。
模型下载
安装 modelscope 库:
pip install modelscope
下载模型:
from modelscope import snapshot_download
snapshot_download('tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ', cache_dir="models")
推理测试
1. 使用 llama-cli
llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
报错:
Loading model... |gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: ... failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
Failed to load the model
经检查,目标模型应为 stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ,但公共仓库中未找到该版本,导致路径不匹配。
2. 使用 transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "/root/models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype="auto", device_map="auto"
)
prompt = "Write a quick sort algorithm."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=65536)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)
执行亦失败,报错如下:
ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`
依赖调试
尝试安装 gptqmodel:
pip install gptqmodel
报错:
Exception: Unable to detect torch version via uv/pip/conda/importlib...
ERROR: Failed to build 'gptqmodel' when getting requirements to build wheel
尝试 conda 安装:
conda install gptqmodel
报错:
PackagesNotFoundError: The following packages are not available from current channels:
- gptqmodel
总结
当前环境配置未解决,暂时搁置。
- llama.cpp 因模型仓库路径不匹配无法加载。
- Transformers 因 AWQ 量化依赖库
gptqmodel在当前环境中无法安装(torch 版本或通道限制),导致推理失败。

