基于 LLaMA-Factory 微调 Qwen3-VL 视觉模型及 WEBUI 部署

基于 LLaMA-Factory 微调 Qwen3-VL 视觉模型及 WEBUI 部署 | 极客日志

组件	最低要求	推荐配置
GPU	NVIDIA RTX 3090 (24GB)	A100/A6000/V100 × 2 或更高
显存	≥24GB	≥48GB（便于全参数微调探索）
存储	≥100GB SSD	≥500GB NVMe（用于缓存模型与数据集）

# 创建虚拟环境
conda create -n qwen_vl python=3.10
conda activate qwen_vl

# 克隆项目
git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory

pip install -e ".[torch,metrics]" -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install flash-attn==2.6.3 --no-build-isolation -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install bitsandbytes==0.43.1 deepspeed==0.14.4
pip install --upgrade transformers>=4.45.0

git lfs install
git clone https://www.modelscope.cn/qwen/Qwen3-VL-4B-Instruct.git /data/model/qwen3-vl-4b-instruct

/data/model/qwen3-vl-4b-instruct/
├── config.json
├── model.safetensors.index.json
├── preprocessor_config.json
└── tokenizer_config.json

[
  {
    "messages": [
      { "role": "user", "content": "<image>请识别这张身份证上的姓名？" },
      { "role": "assistant", "content": "张三丰" }
    ],
    "images": ["/path/to/id_card_001.jpg"]
  }
]

[
  {
    "messages": [
      { "role": "user", "content": "<image>请识别图片中的人名?" },
      { "role": "assistant", "content": "张三丰" }
    ],
    "images": ["/data/service/LLaMA-Factory/data/images/1.png"]
  }
]

"qwen_vl_demo": {
  "file_name": "qwen_vl_demo.json",
  "formatting": "sharegpt",
  "columns": {
    "messages": "messages",
    "images": "images"
  },
  "tags": {
    "role_tag": "role",
    "content_tag": "content",
    "user_tag": "user",
    "assistant_tag": "assistant"
  }
}

cp examples/train_lora/qwen2vl_lora_sft.yaml examples/train_lora/qwen3vl_lora_sft.yaml
vim examples/train_lora/qwen3vl_lora_sft.yaml

### model
model_name_or_path: /data/model/qwen3-vl-4b-instruct

### method
stage: sft # SFT 阶段微调
do_train: true
finetuning_type: lora # 使用 LoRA
lora_target: all # 对所有线性层注入适配器

### dataset
dataset: qwen_vl_demo
template: qwen2_vl # 当前仍沿用 qwen2_vl 模板
cutoff_len: 2048 # 支持更长上下文
max_samples: 1000
preprocessing_num_workers: 8

### output
output_dir: /data/output/qwen3-vl-lora-ft
logging_steps: 10
save_steps: 100
plot_loss: true

### training
per_device_train_batch_size: 1
gradient_accumulation_steps: 16 # 显存不足时增大此值
learning_rate: 1e-4
num_train_epochs: 3
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: false # V100 不支持 BF16，改用 fp16
ddp_timeout: 180000000

### evaluation
val_size: 0.1
eval_strategy: steps
eval_steps: 50

llamafactory-cli train examples/train_lora/qwen3vl_lora_sft.yaml

[INFO] loading configuration file /data/model/qwen3-vl-4b-instruct/config.json
[INFO] Model config Qwen3VLConfig { ... "model_type": "qwen3_vl" ... }
[INFO] Fine-tuning method: LoRA trainable params: 24,576,000 || all params: 4,200,000,000 || trainable%: 0.585%
***** Running training *****
Num examples = 90 Total optimization steps = 27
Epoch: 1.0, Step: 27/27, Loss: 0.214
Saving model checkpoint to /data/output/qwen3-vl-lora-ft

/data/output/qwen3-vl-lora-ft/
├── adapter_model.bin # LoRA 权重
├── configuration.json
├── tokenizer_config.json
└── training_loss.png # 损失曲线图

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("/data/model/qwen3-vl-4b-instruct")
lora_model = PeftModel.from_pretrained(base_model, "/data/output/qwen3-vl-lora-ft")
merged_model = lora_model.merge_and_unload()
merged_model.save_pretrained("/data/model/qwen3-vl-4b-instruct-finetuned")

docker run -d \
-p 7860:7860 \
-v /data/model/qwen3-vl-4b-instruct-finetuned:/app/models \
--gpus all \
--shm-size="16gb" \
qwen3-vl-webui:latest

pip install --upgrade transformers>=4.45.0

from transformers import AutoConfig
config = AutoConfig.from_pretrained("/data/model/qwen3-vl-4b-instruct")
print(config.model_type) # 应输出 'qwen3_vl'

RuntimeError: CUDA error: too many resources requested for launch

vi /data/model/qwen3-vl-4b-instruct/config.json

模块	关键成果
环境搭建	成功配置 LLaMA-Factory + Qwen3-VL 联合开发环境
数据工程	掌握多模态指令数据集的组织与注册方式
微调实践	完成 LoRA 微调全流程，获得定制化视觉识别能力
部署验证	实现模型合并并在 WEBUI 中完成可视化测试

基于 LLaMA-Factory 微调 Qwen3-VL 视觉模型及 WEBUI 部署

背景

技术选型解析：为何选择 LLaMA-Factory + Qwen3-VL-WEBUI？

2.1 LLaMA-Factory：轻量高效的微调框架

2.2 Qwen3-VL-WEBUI：开箱即用的推理环境

前置准备：软硬件与数据环境搭建

3.1 硬件建议

3.2 软件环境

3.3 下载基础模型

数据准备：构建高质量视觉指令数据集

4.1 数据格式规范（ShareGPT 风格）

4.2 示例数据集构建

4.3 注册数据集元信息

微调配置详解：YAML 驱动的精细化控制

启动微调：命令行与监控全流程

模型合并与部署：集成到 Qwen3-VL-WEBUI

7.1 合并 LoRA 权重至基础模型

7.2 启动 Qwen3-VL-WEBUI 容器

常见问题与解决方案

❌ 问题 1：KeyError: 'qwen3_vl'

❌ 问题 2：CUDA Error — Too Many Resources Requested

总结与进阶建议

✅ 本文核心收获

🚀 进阶方向建议

🔗 延伸阅读

更多推荐文章

相关免费在线工具

基于 LLaMA-Factory 微调 Qwen3-VL 视觉模型及 WEBUI 部署

背景

技术选型解析：为何选择 LLaMA-Factory + Qwen3-VL-WEBUI？

2.1 LLaMA-Factory：轻量高效的微调框架

2.2 Qwen3-VL-WEBUI：开箱即用的推理环境

前置准备：软硬件与数据环境搭建

3.1 硬件建议

3.2 软件环境

3.3 下载基础模型

数据准备：构建高质量视觉指令数据集

4.1 数据格式规范（ShareGPT 风格）

4.2 示例数据集构建

4.3 注册数据集元信息

微调配置详解：YAML 驱动的精细化控制

启动微调：命令行与监控全流程

模型合并与部署：集成到 Qwen3-VL-WEBUI

7.1 合并 LoRA 权重至基础模型

7.2 启动 Qwen3-VL-WEBUI 容器

常见问题与解决方案

❌ 问题 1：KeyError: 'qwen3_vl'

❌ 问题 2：CUDA Error — Too Many Resources Requested

总结与进阶建议

✅ 本文核心收获

🚀 进阶方向建议

🔗 延伸阅读

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具