DCU BW1000 使用 llama.cpp 推理 Qwen3-Coder-30B 模型失败记录 | 极客日志

PythonAI算法

DCU BW1000 使用 llama.cpp 推理 Qwen3-Coder-30B 模型失败记录

综述由AI生成记录了在 DCU BW1000 环境下尝试使用 llama.cpp 和 transformers 库推理 Qwen3-Coder-30B-A3B-Instruct-AWQ 模型的过程。通过 llmfit 分析模型资源需求后，安装 llama.cpp 遇到共享库加载问题，修正环境变量后解决。模型下载阶段发现指定路径模型文件缺失。使用 transformers 推理时因 AWQ 量化需要 gptqmodel 依赖，但该库在当前环境中无法通过 pip 或 conda 安装，导致推理失败。最终结论为环境兼容性不足，暂时搁置。

协议工匠发布于 2026/4/5更新于 2026/5/2329 浏览

使用 llmfit 查看模型情况

llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

=== stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ ===
Provider: stelterlab
Parameters: 4.6B
Quantization: Q4_K_M
Best Quant: Q8_0
Context Length: 262144 tokens
Use Case: Code generation and completion
Category: Coding
Released: 2025-07-31
Runtime: llama.cpp (est. ~17.2 tok/s)

Score Breakdown:
  Overall Score: 66.7 / 100
  Quality: 68  Speed: 43  Fit: 61  Context: 100
  Estimated Speed: 17.2 tok/s

Resource Requirements:
  Min VRAM: 2.4 GB
  Min RAM: 2.6 GB (CPU inference)
  Recommended RAM: 4.3

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

git clone https://gitcode.com/GitHub_Trending/ll/llama.cpp

cd llama.cpp cmake -B build cmake --build build --config Release

export PATH=/root/llama.cpp/build/bin:$PATH

cd build make install

root@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-cli
llama-cli: error while loading shared libraries: libmtmd.so.0: cannot open shared object file: No such file or directory
root@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-gguf
llama-gguf: error while loading shared libraries: libggml-base.so.0: cannot open shared object file: No such file or directory

export PATH=/root/llama.cpp/build/bin:$PATH

pip install modelscope

from modelscope import snapshot_download
snapshot_download('tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ', cache_dir="models")

llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

root@crdnotebook-2027598444851879937-denglf-12859:~# llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
Loading model... |gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'
srv    load_model: failed to load model, 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'

Failed to load the model

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "/root/models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype="auto", device_map="auto"
)
# prepare the model input
prompt = "Write a quick sort algorithm."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
    **model_inputs, max_new_tokens=65536
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs)
46 def validate_environment(self, **kwargs):
47 if not is_gptqmodel_available():
---> 48 raise ImportError(
49     "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`"
50 )
52 if not is_accelerate_available():
53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)")
ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

pip install gptqmodel

Exception: Unable to detect torch version via uv/pip/conda/importlib. Please install torch >= 2.7.1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed to build 'gptqmodel' when getting requirements to build wheel

conda install gptqmodel

DCU BW1000 使用 llama.cpp 推理 Qwen3-Coder-30B 模型失败记录

使用 llmfit 查看模型情况

更多推荐文章

相关免费在线工具

安装 llama.cpp

模型下载

推理

用 llama-cli 推理

尝试用 transformers 推理

总结

调试

报错 ImportError: Loading an AWQ quantized model requires gptqmodel.

更多推荐文章

相关免费在线工具

DCU BW1000 使用 llama.cpp 推理 Qwen3-Coder-30B 模型失败记录

使用 llmfit 查看模型情况

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

安装 llama.cpp

模型下载

推理

用 llama-cli 推理

尝试用 transformers 推理

总结

调试

报错 ImportError: Loading an AWQ quantized model requires gptqmodel.

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具