Llama-Factory 微调 Qwen2.5-VL：数据集制作与部署流程 | 极客日志

PythonAI算法

Llama-Factory 微调 Qwen2.5-VL：数据集制作与部署流程

综述由AI生成详细记录了使用 Llama-Factory 微调 Qwen2.5-VL 模型的全过程，包括环境搭建、数据集制作（Label-studio 及半自动脚本）、数据格式转换、微调配置及 LoRA 权重合并。文章提供了具体的 Python 脚本示例和训练命令，适用于在 Ubuntu 环境下进行多模态大模型的本地化训练与部署实践。

蓝绿部署发布于 2026/4/5更新于 2026/5/2231 浏览

环境配置

Ubuntu 24
NVIDIA 3090 (24G)
CUDA 12.9

一、数据集制作

1. Label-studio 制作数据集

这是从零开始制作数据集的方法。安装完 label-studio 后，输入指令启动：

label-studio start

进入浏览器界面创建项目（Create Project），引入图片后选择图像描述数据集制作（Image Captioning）。

2. 利用 Qwen2.5-VL 半自动制作数据集

利用 Qwen 的图像描述能力进行预生成，再人工复核修改，可减少人力成本。脚本示例如下：

import torch
from modelscope import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
import time
import os
from pathlib import Path
import json

def process_single_image(model, processor, image_path, prompt):
    messages = [{"role": "user", "content": [{"type": "image", "image": image_path}, {"type": "text", "text": prompt}]}]
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    image_inputs, video_inputs = process_vision_info(messages)
    inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
    inputs = inputs.to("cuda")
    time_start = time.time()
    generated_ids = model.generate(**inputs, max_new_tokens=256, do_sample=False)
    time_end = time.time()
    print(f"Inference time for : s")
    generated_ids_trimmed = [out_ids[(in_ids):]  in_ids, out_ids  (inputs.input_ids, generated_ids)]
    output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=, clean_up_tokenization_spaces=)
     output_text[]

 ():
    image_extensions = {, , , , , }
    image_files = []
     file  Path(image_folder).iterdir():
         file.suffix.lower()  image_extensions:
            image_files.append(file)
    image_files.sort()
      image_files:
        ()
        
    ()
    results = []
     image_file  image_files:
        ()
        :
            result = process_single_image(model, processor, (image_file), prompt)
            ()
            results.append({: image_file.name, : (image_file), : result})
         Exception  e:
            ()
            results.append({: image_file.name, : (image_file), : , : })
     output_file:
         (output_file, , encoding=)  f:
             item  results:
                json_line = {: item[], : item[]}
                f.write(json.dumps(json_line, ensure_ascii=) + )
        ()
     results

 __name__ == :
    model = Qwen2_5_VLForConditionalGeneration.from_pretrained(, torch_dtype=, device_map=)
    min_pixels =  *  * 
    max_pixels =  *  * 
    processor = AutoProcessor.from_pretrained(, min_pixels=min_pixels, max_pixels=max_pixels)
    image_folder = 
    prompt = 
    output_file = 
    results = process_images_in_folder(model, processor, image_folder, prompt, output_file)
    ()

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory

conda create -n llama-factory python=3.10
conda activate llama-factory
# 或使用 venv
# python -m venv venv
# source venv/bin/activate

pip install -r requirements.txt

{"image":"/path/to/image.jpg","text":"图像描述语句"}

{"images":["/home/user/work/PICS/Smoke001.png"],"conversations":[{"content":"<image>\n请分析图像中红色矩形框内是否存在吸烟行为，并说明理由。","from":"user"},{"content":"红色矩形框中的人在吸烟。","from":"assistant"}]}

import json
input_file = "./inference_results_Smoke.jsonl"
output_file = "./smoke_dataset.jsonl"
with open(input_file, 'r', encoding='utf-8') as infile:
    lines = infile.readlines()
converted_lines = []
for line in lines:
    data = json.loads(line.strip())
    new_data = {
        "images": [data["image"]],
        "conversations": [
            {"content": "<image>\n请分析图像中红色矩形框内是否存在吸烟行为，并说明理由。", "from": "user"},
            {"content": data["text"], "from": "assistant"}
        ]
    }
    converted_lines.append(json.dumps(new_data, ensure_ascii=False) + '\n')
with open(output_file, 'w', encoding='utf-8') as outfile:
    outfile.writelines(converted_lines)
print(f"转换完成！已保存到 {output_file}")

{"smoke_dataset":{"file_name":"smoke_dataset.jsonl","formatting":"sharegpt","columns":{"messages":"conversations","images":"images"},"tags":{"role_tag":"from","content_tag":"content","user_tag":"user","assistant_tag":"assistant"}}}

cd /home/user/work/LLaMA-Factory
python src/train.py \
  --stage sft \
  --do_train \
  --model_name_or_path ./models/Qwen2.5-VL-7B-Instruct \
  --dataset smoke_dataset \
  --dataset_dir . \
  --template qwen2_vl \
  --finetuning_type lora \
  --lora_target all \
  --output_dir saves/Qwen2.5-VL-7B-Instruct-lora \
  --per_device_train_batch_size 1 \
  --gradient_accumulation_steps 8 \
  --lr_scheduler_type cosine \
  --logging_steps 10 \
  --save_steps 100 \
  --learning_rate 5e-5 \
  --num_train_epochs 3.0 \
  --plot_loss \
  --fp16

# merge_lora_weights.py
import os
import torch
from transformers import AutoModelForVision2Seq, AutoTokenizer, AutoProcessor
from peft import PeftModel

def merge_lora_weights():
    base_model_path = "./models/Qwen2.5-VL-7B-Instruct"
    lora_weights_path = "./saves/Qwen2.5-VL-7B-Instruct-lora/checkpoint-3520"
    output_path = "./merged_qwen2.5-vl-finetuned"
    print("Loading base model...")
    base_model = AutoModelForVision2Seq.from_pretrained(
        base_model_path, torch_dtype=torch.float16, low_cpu_mem_usage=True, trust_remote_code=True)
    print("Loading LoRA adapter...")
    lora_model = PeftModel.from_pretrained(base_model, lora_weights_path)
    print("Merging weights...")
    merged_model = lora_model.merge_and_unload()
    print("Saving merged model...")
    os.makedirs(output_path, exist_ok=True)
    merged_model.save_pretrained(output_path, safe_serialization=True, max_shard_size="5GB")
    tokenizer = AutoTokenizer.from_pretrained(base_model_path, trust_remote_code=True)
    tokenizer.save_pretrained(output_path)
    processor = AutoProcessor.from_pretrained(base_model_path, trust_remote_code=True)
    processor.save_pretrained(output_path)
    print(f"Merged model saved to {output_path}")

if __name__ == "__main__":
    merge_lora_weights()

Llama-Factory 微调 Qwen2.5-VL：数据集制作与部署流程

环境配置

一、数据集制作

1. Label-studio 制作数据集

2. 利用 Qwen2.5-VL 半自动制作数据集

更多推荐文章

相关免费在线工具

二、LLaMA-Factory 微调

1. 配置 LLaMA-Factory 环境

2. 转化标签数据格式

3. 启动微调

三、模型合并

更多推荐文章

相关免费在线工具

Llama-Factory 微调 Qwen2.5-VL：数据集制作与部署流程

环境配置

一、数据集制作

1. Label-studio 制作数据集

2. 利用 Qwen2.5-VL 半自动制作数据集

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

二、LLaMA-Factory 微调

1. 配置 LLaMA-Factory 环境

2. 转化标签数据格式

3. 启动微调

三、模型合并

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具