使用 llmfit 查看模型情况
llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ
===
记录了在 DCU BW1000 环境下尝试使用 llama.cpp 和 transformers 库推理 Qwen3-Coder-30B-A3B-Instruct-AWQ 模型的过程。通过 llmfit 分析模型资源需求后,安装 llama.cpp 遇到共享库加载问题,修正环境变量后解决。模型下载阶段发现指定路径模型文件缺失。使用 transformers 推理时因 AWQ 量化需要 gptqmodel 依赖,但该库在当前环境中无法通过 pip 或 conda 安装,导致推理失败。最终结论为环境兼容性不足,暂时搁置。
llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ
===
下载 llama.cpp 源代码
git clone https://gitcode.com/GitHub_Trending/ll/llama.cpp
编译 llama.cpp
cd llama.cpp cmake -B build cmake --build build --config Release
加入路径
export PATH=/root/llama.cpp/build/bin:$PATH
或者也可以直接使用 make install
cd build make install
但是安装好后报错
root@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-cli
llama-cli: error while loading shared libraries: libmtmd.so.0: cannot open shared object file: No such file or directory
root@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-gguf
llama-gguf: error while loading shared libraries: libggml-base.so.0: cannot open shared object file: No such file or directory
原来是没有把路径加入的缘故,加入路径后问题解决:
export PATH=/root/llama.cpp/build/bin:$PATH
安装 modelscope
pip install modelscope
下载
from modelscope import snapshot_download
snapshot_download('tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ', cache_dir="models")
llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
报错:
root@crdnotebook-2027598444851879937-denglf-12859:~# llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
Loading model... |gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'
srv load_model: failed to load model, 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'
Failed to load the model
检查发现模型路径应为:stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ。问题是该模型在指定仓库中不存在。
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "/root/models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype="auto", device_map="auto"
)
# prepare the model input
prompt = "Write a quick sort algorithm."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs, max_new_tokens=65536
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)
也是失败
File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs)
46 def validate_environment(self, **kwargs):
47 if not is_gptqmodel_available():
---> 48 raise ImportError(
49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`"
50 )
52 if not is_accelerate_available():
53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)")
ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`
未调通,暂时搁置。
llama.cpp 是因为指定仓库没有那个模型,导致模型不匹配。 transformers 是因为库的问题,需要重新安装 torch 等库,导致需要的库无法安装上,推理失败。
File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs)
46 def validate_environment(self, **kwargs):
47 if not is_gptqmodel_available():
---> 48 raise ImportError(
49 "Loading an AWQ quantized model requires gptqmodel. Please install it with pip install gptqmodel"
50 )
52 if not is_accelerate_available():
53 raise ImportError("Loading an AWQ quantized model requires accelerate (pip install accelerate)")
ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with pip install gptqmodel
安装提示执行
pip install gptqmodel
安装失败,
Exception: Unable to detect torch version via uv/pip/conda/importlib. Please install torch >= 2.7.1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed to build 'gptqmodel' when getting requirements to build wheel
用 conda 试试
conda install gptqmodel
也失败了。
PackagesNotFoundError: The following packages are not available from current channels:

微信公众号「极客日志」,在微信中扫描左侧二维码关注。展示文案:极客日志 zeeklog
使用加密算法(如AES、TripleDES、Rabbit或RC4)加密和解密文本明文。 在线工具,加密/解密文本在线工具,online
生成新的随机RSA私钥和公钥pem证书。 在线工具,RSA密钥对生成器在线工具,online
基于 Mermaid.js 实时预览流程图、时序图等图表,支持源码编辑与即时渲染。 在线工具,Mermaid 预览与可视化编辑在线工具,online
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。 在线工具,curl 转代码在线工具,online
将字符串编码和解码为其 Base64 格式表示形式即可。 在线工具,Base64 字符串编码/解码在线工具,online
将字符串、文件或图像转换为其 Base64 表示形式。 在线工具,Base64 文件转换器在线工具,online