在openi启智社区的dcu bw1000使用llama.cpp推理 stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ（失败）

优质文章学习记录

07 Apr 2026 — 4 min read

openi启智社区的dcu新推出 bw1000计算卡，不耗费积分，可以可劲用！

但是提供的镜像只有一个，感觉用起来很麻烦....

用llmfit看看模型情况

llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

=== stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ ===

Provider: stelterlab
Parameters: 4.6B
Quantization: Q4_K_M
Best Quant: Q8_0
Context Length: 262144 tokens
Use Case: Code generation and completion
Category: Coding
Released: 2025-07-31
Runtime: llama.cpp (est. ~17.2 tok/s)

Score Breakdown:
Overall Score: 66.7 / 100
Quality: 68 Speed: 43 Fit: 61 Context: 100
Estimated Speed: 17.2 tok/s

Resource Requirements:
Min VRAM: 2.4 GB
Min RAM: 2.6 GB (CPU inference)
Recommended RAM: 4.3 GB

MoE Architecture:
Experts: 8 active / 128 total per token
Active VRAM: 0.5 GB (vs 2.4 GB full model)

Fit Analysis:
Status: 🟡 Good
Run Mode: CPU+GPU
Memory Utilization: 0.6% (2.6 / 405.5 GB)

Notes:
MoE: insufficient VRAM for expert offloading
Spilling entire model to system RAM
Performance will be significantly reduced
Best quantization for hardware: Q8_0 (model default: Q4_K_M)
Estimated speed: 17.2 tok/s

安装llama.cpp

下载 llama.cpp源代码

git clone https://gitcode.com/GitHub_Trending/ll/llama.cpp

编译llama.cpp

cd llama.cpp cmake -B build cmake --build build --config Release

加入路径

export PATH=/root/llama.cpp/build/bin:$PATH

或者也可以直击用make install

cd build make install

但是安装好后报错

oot@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-cli llama-cli: error while loading shared libraries: libmtmd.so.0: cannot open shared object file: No such file or directory root@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-gguf llama-gguf: error while loading shared libraries: libggml-base.so.0: cannot open shared object file: No such file or directory

原来是没有把路径加入的缘故，加入路径，问题解决：

export PATH=/root/llama.cpp/build/bin:$PATH

模型下载

安装modelscope

pip install modelscope

下载

from modelscope import snapshot_download snapshot_download('tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ', cache_dir="models")

推理

用llama-cli推理

llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

报错：

root@crdnotebook-2027598444851879937-denglf-12859:~# llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

Loading model... |gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'
srv load_model: failed to load model, 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'

Failed to load the model

看了一下，应该是这个模型： stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

问题是这个模型魔搭没有....

尝试用transformers推理

from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "/root/models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Write a quick sort algorithm." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=65536 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() content = tokenizer.decode(output_ids, skip_special_tokens=True) print("content:", content)

也是失败

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

总结

没调通，先搁置

llama.cpp是因为魔搭没有那个模型，所以模型不匹配

transformers是因为库的问题，需要重新安装torch等库，导致需要的库无法安装上，推理失败。

调试

报错ImportError: Loading an AWQ quantized model requires gptqmodel.

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

安装提示执行

pip install gptqmodel

安装失败，

 Exception: Unable to detect torch version via uv/pip/conda/importlib. Please install torch >= 2.7.1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed to build 'gptqmodel' when getting requirements to build wheel

用conda试试

conda install gptqmodel

也失败了。

PackagesNotFoundError: The following packages are not available from current channels:

- gptqmodel

手把手教你 Openclaw 在 Mac 上本地化部署，保姆级教程！接入飞书打造私人 AI 助手

AppOS：始于 Mac，却远不止于 Mac。跟随 AppOS一起探索更广阔的 AI 数字生活。 OpenClaw 是 Moltbot/Clawdbot 的最新正式名称。经过版本迭代与改名后，2026年统一以「OpenClaw」作为官方名称，核心定位是通过自然语言指令，替代人工完成流程化、重复性工作，无需用户掌握编程技能，适配多场景自动化需求。该项目经历了多次更名，Clawdbot → Moltbot → OpenClaw（当前名称） # OpenClaw 是什么？ OpenClaw 是一个开源的个人 AI 助手平台。简单来说，它是一个可以将你自己的 AI 助手接入你已经在用的即时通讯工具（Telegram、WhatsApp、飞书等）的系统。你可以自己挑选 AI 模型进行连接，添加各种工具和技能（如飞书等），构建专属工作流。说白了如果应用的够好，它就是一个能帮你干活的“

【AI】OpenClaw一键部署安装指南（Windows+WSL+Feishu）

目录 * 1 安装 WSL * 1.1 以管理员身份打开 PowerShell * 1.2 执行安装命令 * 1.3 设置 Ubuntu 用户名和密码 * 2 重启后打开 WSL 终端 * 3 安装 Node.js * 3.1 更新软件包列表 * 3.3 添加 Node.js 22.x 源 * 3.4 安装 Node.js * 3.5 验证安装 * 4 一键安装 OpenClaw * 4.1 清理 npm

Whisper 模型资源大全：官方 + 社区版本下载链接汇总

以下是关于Whisper模型的资源大全，包括官方和社区版本的下载链接汇总。Whisper是由OpenAI开发的先进语音识别模型，支持多语言转录和翻译。我将以结构清晰的方式组织信息，确保所有资源真实可靠，来源均为官方或知名社区平台（如GitHub和Hugging Face）。资源分为官方版本（由OpenAI直接提供）和社区版本（由开源社区维护），并附带简要说明。 1. 官方资源官方版本是OpenAI发布的原始模型，提供完整的权重文件和代码。所有资源均可在OpenAI的GitHub仓库获取： * GitHub仓库链接：openai/whisper * 这里包含： * 模型权重下载：支持多种尺寸（如tiny、base、small、medium、large），下载地址在仓库的README中直接提供。 * 安装指南：使用Python和PyTorch运行模型的详细步骤。 * 示例代码：包括转录和翻译的Python脚本。 * 模型尺寸与选择：小尺寸（如base）适合快速任务，大尺寸（如large-v2）支持更高精度。直接模型下载：仓库中的模型权

Llama-3.2V-11B-cot在金融文档处理中的应用：财报截图数据逻辑验证案例

Llama-3.2V-11B-cot在金融文档处理中的应用：财报截图数据逻辑验证案例 1. 项目背景与工具介绍 Llama-3.2V-11B-cot是基于Meta Llama-3.2V-11B-cot多模态大模型开发的高性能视觉推理工具，特别针对金融文档处理场景进行了优化。该工具在双卡4090环境下表现出色，通过深度优化解决了视觉权重加载等关键问题，支持Chain of Thought(CoT)逻辑推演能力。在金融领域，分析师每天需要处理大量财报截图、数据表格和图表。传统人工验证方式效率低下且容易出错。Llama-3.2V-11B-cot的视觉推理能力可以自动识别金融文档中的关键数据，并进行逻辑验证，大幅提升工作效率。 2. 金融文档处理的核心挑战 2.1 传统方法的局限性金融文档处理面临三大核心挑战： * 数据识别准确率低：财报截图中的表格结构复杂，传统OCR技术难以准确识别 * 逻辑验证困难：财务数据间的勾稽关系需要专业金融知识才能验证 * 处理效率低下：人工核对一份财报平均需要2-3小时，高峰期难以应对 2.2 Llama-3.2V-11B-cot的