LLaMAFactory 与 ModelScope 大模型部署及 GGUF 转换实战 | 极客日志

PythonAI算法

LLaMAFactory 与 ModelScope 大模型部署及 GGUF 转换实战

微调后大模型的部署流程。首先使用 llama.cpp 将 HF 格式模型转换为 GGUF 格式。接着演示了通过 llama.app 进行命令行和服务模式部署，并指出 Ollama 对 Qwen3 的兼容问题。最后展示了如何在 ModelScope 平台上传和下载 GGUF 模型文件，提供了完整的本地轻量化部署方案。

链路追踪发布于 2026/4/6更新于 2026/7/2662 浏览

LLaMAFactory 与 ModelScope 大模型部署及 GGUF 转换实战

一、前言

上次简单介绍了 LLaMAFactory、ModelScope 的微调，今天总结如何部署已经微调好的大模型。

本次演示基于魔搭社区（https://www.modelscope.cn/my/mynotebook）

二、将模型转换为 gguf

2.1 克隆 llama.cpp 并安装环境依赖

# 进入根目录
cd /mnt/workspace
# 需要用 llama.cpp 仓库的 convert_hf_to_gguf.py 脚本转换
git clone https://github.com/ggerganov/llama.cpp.git
# 进入 llama.cpp 文件夹
cd llama.cpp
# 创建虚拟环境
python -m venv .venv
# 进入虚拟环境
source .venv/bin/activate
# 安装依赖
pip install -r requirements.txt

2.2 转换模型为 gguf

python convert_hf_to_gguf.py /mnt/workspace/LLaMA-Factory/saves/qwen3_sft_merged \
  --outtype q8_0 --verbose \
  --outfile /mnt/workspace/LLaMA-Factory/saves/qwen3_sft_merged/Qwen3-4B-Instruct_q8_0.gguf

执行结束后，gguf 文件会保存在 /mnt/workspace/LLaMA-Factory/saves/qwen3_sft_merged/Qwen3-4B-Instruct_q8_0.gguf。

三、部署

3.1 基于 llama.cpp（推荐）

GitHub: https://github.com/ggml-org/llama.cpp

3.1.1 安装 llama.cpp

可参考官方文档：https://github.com/ggml-org/llama.cpp/blob/master/docs/install.md#homebrew-mac-and-linux

brew install llama.cpp

如果提示未安装 brew，执行下面的命令：

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

3.1.2 加载大模型（CLI 模式）

llama-cli -m /mnt/workspace/LLaMA-Factory/saves/qwen3_sft_merged/Qwen3-4B-Instruct_q8_0.gguf

可在命令行跟大模型提问。

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

llama-server -m /mnt/workspace/LLaMA-Factory/saves/qwen3_sft_merged/Qwen3-4B-Instruct_q8_0.gguf --port 8080

# 进入合并后的模型目录
cd /mnt/workspace/LLaMA-Factory/saves/qwen3_sft_merged
# 创建模型
ollama create my-qwen3-4b-sft-merged -f Modelfile
# 启动模型
ollama run my-qwen3-4b-sft-merged

# 上传 gguf 版本
modelscope upload <你的用户名>/qwen3-4b-sft-merged-gguf /mnt/workspace/LLaMA-Factory/saves/qwen3_sft_merged --token <你的 token>

# 安装 modelscope
pip install modelscope
# 下载模型
modelscope download --model <你的用户名>/qwen3-4b-sft-merged-gguf

LLaMAFactory 与 ModelScope 大模型部署及 GGUF 转换实战

一、前言

二、将模型转换为 gguf

2.1 克隆 llama.cpp 并安装环境依赖

2.2 转换模型为 gguf

三、部署

3.1 基于 llama.cpp（推荐）

3.1.1 安装 llama.cpp

3.1.2 加载大模型（CLI 模式）

更多推荐文章

相关免费在线工具

3.1.3 以服务的模式加载大模型（Server 模式）

3.2 基于 ollama

四、将模型上传至 ModelScope

4.1 获取 Token

4.2 获取用户名

4.3 上传模型

4.4 查看上传结果

4.5 下载上传之后的模型

五、结语

更多推荐文章

相关免费在线工具

LLaMAFactory 与 ModelScope 大模型部署及 GGUF 转换实战

一、前言

二、将模型转换为 gguf

2.1 克隆 llama.cpp 并安装环境依赖

2.2 转换模型为 gguf

三、部署

3.1 基于 llama.cpp（推荐）

3.1.1 安装 llama.cpp

3.1.2 加载大模型（CLI 模式）

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

3.1.3 以服务的模式加载大模型（Server 模式）

3.2 基于 ollama

四、将模型上传至 ModelScope

4.1 获取 Token

4.2 获取用户名

4.3 上传模型

4.4 查看上传结果

4.5 下载上传之后的模型

五、结语

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具