在openi启智社区的dcu bw1000使用llama.cpp推理 stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ（失败）

优质文章学习记录

06 Apr 2026 — 4 min read

openi启智社区的dcu新推出 bw1000计算卡，不耗费积分，可以可劲用！

但是提供的镜像只有一个，感觉用起来很麻烦....

用llmfit看看模型情况

llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

=== stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ ===

Provider: stelterlab
Parameters: 4.6B
Quantization: Q4_K_M
Best Quant: Q8_0
Context Length: 262144 tokens
Use Case: Code generation and completion
Category: Coding
Released: 2025-07-31
Runtime: llama.cpp (est. ~17.2 tok/s)

Score Breakdown:
Overall Score: 66.7 / 100
Quality: 68 Speed: 43 Fit: 61 Context: 100
Estimated Speed: 17.2 tok/s

Resource Requirements:
Min VRAM: 2.4 GB
Min RAM: 2.6 GB (CPU inference)
Recommended RAM: 4.3 GB

MoE Architecture:
Experts: 8 active / 128 total per token
Active VRAM: 0.5 GB (vs 2.4 GB full model)

Fit Analysis:
Status: 🟡 Good
Run Mode: CPU+GPU
Memory Utilization: 0.6% (2.6 / 405.5 GB)

Notes:
MoE: insufficient VRAM for expert offloading
Spilling entire model to system RAM
Performance will be significantly reduced
Best quantization for hardware: Q8_0 (model default: Q4_K_M)
Estimated speed: 17.2 tok/s

安装llama.cpp

下载 llama.cpp源代码

git clone https://gitcode.com/GitHub_Trending/ll/llama.cpp

编译llama.cpp

cd llama.cpp cmake -B build cmake --build build --config Release

加入路径

export PATH=/root/llama.cpp/build/bin:$PATH

或者也可以直击用make install

cd build make install

但是安装好后报错

oot@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-cli llama-cli: error while loading shared libraries: libmtmd.so.0: cannot open shared object file: No such file or directory root@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-gguf llama-gguf: error while loading shared libraries: libggml-base.so.0: cannot open shared object file: No such file or directory

原来是没有把路径加入的缘故，加入路径，问题解决：

export PATH=/root/llama.cpp/build/bin:$PATH

模型下载

安装modelscope

pip install modelscope

下载

from modelscope import snapshot_download snapshot_download('tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ', cache_dir="models")

推理

用llama-cli推理

llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

报错：

root@crdnotebook-2027598444851879937-denglf-12859:~# llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

Loading model... |gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'
srv load_model: failed to load model, 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'

Failed to load the model

看了一下，应该是这个模型： stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

问题是这个模型魔搭没有....

尝试用transformers推理

from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "/root/models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Write a quick sort algorithm." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=65536 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() content = tokenizer.decode(output_ids, skip_special_tokens=True) print("content:", content)

也是失败

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

总结

没调通，先搁置

llama.cpp是因为魔搭没有那个模型，所以模型不匹配

transformers是因为库的问题，需要重新安装torch等库，导致需要的库无法安装上，推理失败。

调试

报错ImportError: Loading an AWQ quantized model requires gptqmodel.

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

安装提示执行

pip install gptqmodel

安装失败，

 Exception: Unable to detect torch version via uv/pip/conda/importlib. Please install torch >= 2.7.1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed to build 'gptqmodel' when getting requirements to build wheel

用conda试试

conda install gptqmodel

也失败了。

PackagesNotFoundError: The following packages are not available from current channels:

- gptqmodel

【福利教程】一键解锁 ChatGPT / Gemini / Spotify 教育权益！TG 机器人全自动验证攻略

想要免费使用 ChatGPT 教师版（直至 2027 年）？想白嫖 Gemini Advanced 一年？还是想以学生优惠价订阅 Spotify？无需繁琐的资料证明，现在只需要一个 Telegram 机器人，即可自动化完成 SheerID 身份认证，轻松解锁各类教育版专属福利！ 🎁 你能获取哪些权益？通过此机器人协助验证，你可以获取以下顶级服务的教育/学生权益： 1. 🤖 ChatGPT K-12 教师版 * 权益：美国 K-12 教育工作者专属福利，相当于 Plus 会员体验。 * 有效期：免费使用至 2027 年 6 月。 1. ✨ Gemini One Pro (教育版) * 权益：Google 最强 AI

2.2 基于ultrascale 架构FPGA的system manager wizard使用（温压监测）

Reference：《PG185》《UG580》部分文案源于网友博客，AIGC和个人理解，如有雷同纯属抄袭一、介绍简述： Xilinx System Management Wizard 是 Vivado 和 Vitis 工具中的一个图形化配置工具，主要用于为 FPGA 设计生成与系统监控和管理相关的 IP 核。这个工具帮助用户配置和集成诸如温度监控、电压监控、时钟监控、外部模拟输入等功能到 FPGA 设计中。它支持AXI4-Lite 与 DRP 接口主要功能： * 温度和电压监测： * 内建传感器：支持 FPGA 内部温度、VCCINT（核心电压）、VCCAUX（辅助电压）、VCCBRAM（BRAM 电压）等电压和温度监测。通过 SYSMON 进行实时数据采集。

Flutter 组件 bip340 适配鸿蒙 HarmonyOS 实战：次世代 Schnorr 签名，为鸿蒙 Web3 与隐私计算筑牢加密防线

欢迎加入开源鸿蒙跨平台社区：https://openharmonycrossplatform.ZEEKLOG.net Flutter 组件 bip340 适配鸿蒙 HarmonyOS 实战：次世代 Schnorr 签名，为鸿蒙 Web3 与隐私计算筑牢加密防线前言在鸿蒙（OpenHarmony）生态迈向去中心化金融（DeFi）、隐私通讯及安全资产管理等高阶安全场景的背景下，如何实现更高性能、更具扩展性且抗攻击能力的数字签名架构，已成为决定应用闭环安全性的“压舱石”。在鸿蒙设备这类强调分布式鉴权与芯片级安全（TEE/SE）的移动终端上，如果依然沿用传统的 ECDSA 签名算法，由于由于其固有的可延展性风险与高昂的聚合验证成本，极易由于由于在大规模节点验证时的 CPU 负载过高导致交互滞后。我们需要一种能够实现签名线性聚合、计算逻辑极简且具备原生抗延展性的密码学方案。 bip340 为 Flutter 开发者引入了比特币 Taproot 升级的核心——Schnorr 签名算法。它不仅在安全性上超越了传统标准，更通过其线性的数学特性，

66 个可直接访问的机器人项目合集！涵盖科研 / 教育 / 工业 / 医疗（附详细介绍与项目代码链接）

🔥66 个可直接访问的机器人项目合集！涵盖科研 / 教育 / 工业 / 医疗，附详细介绍与链接 📚 目录 1. 引言：为什么整理这份项目清单？ 2. 研究与开源项目（20 个） 3. 人形与仿生机器人（12 个） 4. 移动与自主机器人（12 个） 5. 教育与 DIY 机器人项目（10 个） 6. 医疗与服务机器人（9 个） 7. 农业与工业机器人（8 个） 8. 实用工具与访问建议 9. 总结与互动 📝 引言：为什么整理这份项目清单？作为机器人领域的从业者 / 学习者，你是否曾遇到过这些问题： * 想找开源项目练手，却搜到大量失效链接？ * 想了解某细分领域（如人形机器人、