在openi启智社区的dcu bw1000使用llama.cpp推理 stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ（失败）

优质文章学习记录

10 Apr 2026 — 4 min read

openi启智社区的dcu新推出 bw1000计算卡，不耗费积分，可以可劲用！

但是提供的镜像只有一个，感觉用起来很麻烦....

用llmfit看看模型情况

llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

=== stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ ===

Provider: stelterlab
Parameters: 4.6B
Quantization: Q4_K_M
Best Quant: Q8_0
Context Length: 262144 tokens
Use Case: Code generation and completion
Category: Coding
Released: 2025-07-31
Runtime: llama.cpp (est. ~17.2 tok/s)

Score Breakdown:
Overall Score: 66.7 / 100
Quality: 68 Speed: 43 Fit: 61 Context: 100
Estimated Speed: 17.2 tok/s

Resource Requirements:
Min VRAM: 2.4 GB
Min RAM: 2.6 GB (CPU inference)
Recommended RAM: 4.3 GB

MoE Architecture:
Experts: 8 active / 128 total per token
Active VRAM: 0.5 GB (vs 2.4 GB full model)

Fit Analysis:
Status: 🟡 Good
Run Mode: CPU+GPU
Memory Utilization: 0.6% (2.6 / 405.5 GB)

Notes:
MoE: insufficient VRAM for expert offloading
Spilling entire model to system RAM
Performance will be significantly reduced
Best quantization for hardware: Q8_0 (model default: Q4_K_M)
Estimated speed: 17.2 tok/s

安装llama.cpp

下载 llama.cpp源代码

git clone https://gitcode.com/GitHub_Trending/ll/llama.cpp

编译llama.cpp

cd llama.cpp cmake -B build cmake --build build --config Release

加入路径

export PATH=/root/llama.cpp/build/bin:$PATH

或者也可以直击用make install

cd build make install

但是安装好后报错

oot@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-cli llama-cli: error while loading shared libraries: libmtmd.so.0: cannot open shared object file: No such file or directory root@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-gguf llama-gguf: error while loading shared libraries: libggml-base.so.0: cannot open shared object file: No such file or directory

原来是没有把路径加入的缘故，加入路径，问题解决：

export PATH=/root/llama.cpp/build/bin:$PATH

模型下载

安装modelscope

pip install modelscope

下载

from modelscope import snapshot_download snapshot_download('tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ', cache_dir="models")

推理

用llama-cli推理

llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

报错：

root@crdnotebook-2027598444851879937-denglf-12859:~# llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

Loading model... |gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'
srv load_model: failed to load model, 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'

Failed to load the model

看了一下，应该是这个模型： stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

问题是这个模型魔搭没有....

尝试用transformers推理

from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "/root/models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Write a quick sort algorithm." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=65536 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() content = tokenizer.decode(output_ids, skip_special_tokens=True) print("content:", content)

也是失败

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

总结

没调通，先搁置

llama.cpp是因为魔搭没有那个模型，所以模型不匹配

transformers是因为库的问题，需要重新安装torch等库，导致需要的库无法安装上，推理失败。

调试

报错ImportError: Loading an AWQ quantized model requires gptqmodel.

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

安装提示执行

pip install gptqmodel

安装失败，

 Exception: Unable to detect torch version via uv/pip/conda/importlib. Please install torch >= 2.7.1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed to build 'gptqmodel' when getting requirements to build wheel

用conda试试

conda install gptqmodel

也失败了。

PackagesNotFoundError: The following packages are not available from current channels:

- gptqmodel

AI安全工具：AI供应链安全检测工具的使用

AI安全工具：AI供应链安全检测工具的使用 📝 本章学习目标：本章介绍实用工具，帮助读者掌握AI安全合规治理的工具使用。通过本章学习，你将全面掌握"AI安全工具：AI供应链安全检测工具的使用"这一核心主题。一、引言：为什么这个话题如此重要在AI技术快速发展的今天，AI安全工具：AI供应链安全检测工具的使用已经成为每个AI从业者和企业管理者必须了解的核心知识。随着AI应用的深入，安全风险、合规要求、治理挑战日益凸显，掌握这些知识已成为AI时代的基本素养。 1.1 背景与意义 💡 核心认知：AI安全、合规与治理是AI健康发展的三大基石。安全是底线，合规是保障，治理是方向。三者相辅相成，缺一不可。近年来，AI安全事件频发，合规要求日益严格，治理挑战不断升级。从数据泄露到算法歧视，从隐私侵犯到伦理争议，AI发展面临前所未有的挑战。据统计，超过60%的企业在AI应用中遇到过安全或合规问题，造成的经济损失高达数十亿美元。 1.2 本章结构概览为了帮助读者系统性地掌握本章内容，我将从以下几个维度展开：

【AI智能体】腾讯云服务器部署OpenClaw对接飞书实战详解

目录一、前言二、OpenClaw介绍 2.1 OpenClaw 是什么 2.2 OpenClaw 四大核心特点 2.3 OpenClaw 应用场景 2.3.1 个人生产力提升 2.3.2 一人公司/小微创业 2.3.3 企业级应用三、为什么使用云服务器部署四、基于腾讯云服务器部署OpenClaw 4.1 服务器选购 4.2 可视化配置OpenClaw 4.2.1 进入服务器控制台 4.2.2 配置全过程 4.3

工业物联网数据基础设施：Apache IoTDB 与 TimechoDB 的云原生与 AI 进化之路

工业物联网数据基础设施：Apache IoTDB 与 TimechoDB 的云原生与 AI 进化之路写在前面：AI 时代的物联网数据新范式 2025年的今天，我们谈论物联网数据管理时，已不再仅仅满足于“存得下、查得快”。当大型模型开始渗透工业场景，当 AI Agent 试图直接操作数据库进行 Root Cause Analysis，时序数据库正在经历一场从“被动存储”向“主动智能”的深刻进化。面对智慧工厂每日 50TB 的振动数据，面对千万级设备的并发接入，传统的数据库架构不仅在性能上捉襟见肘，在智能化分析层面更是断层。国产自研的 Apache IoTDB 及其企业版 TimechoDB，不仅在云原生架构上给出了高分答卷，更在 2026 年的新版本中，交出了“数据库内置 AI”的惊艳方案。一、重新审视时序数据库：

别再贴字幕了！Naiz AI：从语义到像素，全链路重构你的“数字孪生”

Naiz AI：打破语言边界，正在重新定义“全球视频内容”的表达主权当传统翻译还在为对齐字幕发愁时，Naiz AI 已经让你的视频在 100 种语言里不仅“说得溜”，还实现了“口型完美同步”：你的声音，在全球任何角落听起来都像母语。一、一场让内容创作边界消失的“技术海啸” 2026 年，视频创作领域迎来了一场前所未有的范式转移。如果说过去的视频出海是“戴着枷锁起舞”，那么 Naiz AI 的出现就是彻底打碎了那把名为“语言”的锁。这不是简单的翻译工具，这是一个现象级的全球表达引擎： * 📈 爆发式增长：仅仅数月，Naiz AI 处理的视频时长已跨越百万小时，将原本昂贵的专业人工配音周期从“周”缩短到了“分钟”。 * 🌟 顶级创作者的共同选择：无论是追求极致音质的 YouTube 科技博主，还是需要跨国协作的顶级智库，Naiz AI 的