基于 LLaMA-Factory 微调 Qwen3-VL 全流程实战 | 极客日志

PythonAI算法

基于 LLaMA-Factory 微调 Qwen3-VL 全流程实战

Qwen3-VL 多模态大模型的 LLaMA-Factory 微调实战指南。涵盖环境搭建、数据集构造、LoRA 微调训练、模型合并及 vLLM 部署上线。通过命令行操作实现从数据准备到接口调用的完整闭环，支持私有数据定制与高并发推理服务，适合工业级落地场景。

狂少发布于 2026/4/8更新于 2026/7/2642 浏览

基于 LLaMA-Factory 微调 Qwen3-VL 全流程实战

本文旨在详细介绍如何使用 LLaMA-Factory 对多模态大模型（如 Qwen3-VL）进行监督微调（SFT），涵盖环境搭建、数据集构造、训练合并及 vLLM 部署上线的完整链路。

1. 环境准备

克隆项目与依赖安装

推荐直接使用 Git 克隆项目，避免压缩包解压可能带来的路径问题：

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git

进入项目目录并创建虚拟环境。这里以 Conda 为例，确保 Python 版本为 3.12：

conda create -n llama_env python=3.12
conda activate llama_env
cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple/

注意：国内用户建议使用清华源加速下载，若遇到构建错误，请检查是否缺少 wheel 或 setuptools 等基础包。

模型下载

从 ModelScope 或 HuggingFace 下载模型权重到本地指定目录。例如下载 Qwen3-VL：

modelscope download --model Qwen/Qwen3-VL-2B-Instruct --local_dir ./qwen3_vl_model

2. 启动微调（LoRA SFT）

在 Linux 环境下，命令行是最高效的操作方式。我们使用官方提供的示例脚本作为起点。

配置训练参数

编辑 examples/train_lora/qwen2_5vl_lora_sft.yaml 文件，根据实际硬件调整以下关键项：

### model
model_name_or_path: /data/hcb/LLaMA-Factory-main/qwen3_vl_model # 替换为你的模型路径
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

CUDA_VISIBLE_DEVICES=6 llamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yaml

### model
model_name_or_path: /data/hcb/LLaMA-Factory-main/qwen3_vl_model
adapter_name_or_path: saves/qwen3vl-2b/lora/sft
template: qwen3_vl
trust_remote_code: true

### export
export_dir: output/qwen3vl_lora_sft
export_size: 5
export_device: cpu
export_legacy_format: false

llamafactory-cli export examples/merge_lora/qwen2_5vl_lora_sft.yaml

[
  {
    "messages": [
      {"role": "user", "content": [{"type": "image", "image": "images/table_01.jpg"}, {"type": "text", "text": "请提取图中的表格数据"}]},
      {"role": "assistant", "content": "表格内容如下..."}
    ]
  }
]

pip install vllm==0.11.0 -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple/

export CUDA_VISIBLE_DEVICES=6
python -m vllm.entrypoints.openai.api_server \
  --host 0.0.0.0 \
  --port 8003 \
  --model /data/hcb/LLaMA-Factory-main/output/qwen3vl_lora_sft \
  --served-model-name qwen3_vl \
  --trust-remote-code \
  --dtype float16 \
  --gpu-memory-utilization 0.8 \
  --tp 1

import openai
import base64
import os
from openai import OpenAI

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def multimodal_chat(image_path=None, text_prompt="描述这张图片"):
    client = OpenAI(
        api_key="Empty",
        base_url="http://10.10.185.9:8803/v1/"
    )
    messages = [{"role": "system", "content": "你是一个多模态智能助手，可以理解和分析图像内容。"}]
    
    if image_path and os.path.exists(image_path):
        base64_image = encode_image(image_path)
        user_content = [
            {"type": "text", "text": text_prompt},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
        ]
    else:
        user_content = text_prompt
    
    messages.append({"role": "user", "content": user_content})
    payload = {
        "model": "qwen3_vl",
        "messages": messages,
        "temperature": 0.1,
        "max_tokens": 2000
    }
    
    try:
        response = client.chat.completions.create(**payload, timeout=30)
        return response
    except Exception as e:
        print(f"请求失败：{e}")
        return None

if __name__ == "__main__":
    image_path = r"./test_image.png"
    prompt = "描述这张图片中有什么"
    if os.path.exists(image_path):
        res = multimodal_chat(image_path=image_path, text_prompt=prompt)
        if res and res.choices:
            print(res.choices[0].message.content)

基于 LLaMA-Factory 微调 Qwen3-VL 全流程实战

基于 LLaMA-Factory 微调 Qwen3-VL 全流程实战

1. 环境准备

克隆项目与依赖安装

模型下载

2. 启动微调（LoRA SFT）

配置训练参数

更多推荐文章

相关免费在线工具

执行训练

3. 模型合并

4. 自定义数据集构建

注册新数据集

数据格式参考

5. 模型部署与调用

安装 vLLM

启动服务

客户端请求示例

更多推荐文章

相关免费在线工具

基于 LLaMA-Factory 微调 Qwen3-VL 全流程实战

基于 LLaMA-Factory 微调 Qwen3-VL 全流程实战

1. 环境准备

克隆项目与依赖安装

模型下载

2. 启动微调（LoRA SFT）

配置训练参数

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

执行训练

3. 模型合并

4. 自定义数据集构建

注册新数据集

数据格式参考

5. 模型部署与调用

安装 vLLM

启动服务

客户端请求示例

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具