LLaMA-Factory使用

Ne0inhk

21 Mar 2026 — 25 min read

文章目录

一、LLAMA-Factory简介
二、安装LLaMA-Factory
三、准备训练数据
四、模型训练
五、合并模型权重
- 1.模型合并
- 2.测试

一、LLAMA-Factory简介

LLaMA-Factory是一个简单易用且高效的大模型训练框架，支持上百种大模型的训练，框架特性主要包括：

模型种类：LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen、Yi、Gemma、Baichuan、ChatGLM、Phi 等等。
训练算法：（增量）预训练、（多模态）指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等等。
运算精度：16比特全参数微调、冻结微调、LoRA微调和基于AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ的⅔/⅘/6/8比特QLoRA 微调。
优化算法：GaLore、BAdam、DoRA、LongLoRA、LLaMA Pro、Mixture-of-Depths、LoRA+、LoftQ和PiSSA。
加速算子：FlashAttention-2和Unsloth。
推理引擎：Transformers和vLLM。
实验面板：LlamaBoard、TensorBoard、Wandb、MLflow等等。

本文将介绍如何使用LLAMA-Factory对Qwen2.5系列大模型进行微调（Qwen1.5系列模型也适用），更多特性请参考https://github.com/hiyouga/LlamaFactory

二、安装LLaMA-Factory

LLaMA-Factory的github地址为：https://github.com/hiyouga/LLaMA-Factory 。为防止项目更新带来软件版本不适配，我们下面安装一个历史版本。

在使用AutoDL克隆git仓库时，速度较慢，可以运行如下命令。

source /etc/network_turbo

下载并安装LLaMA-Factory：

cd /root/autodl-tmp
git clone --depth 1 https://github.com/Jiangnanjiezi/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e “.[torch,metrics]” -i https://mirrors.aliyun.com/pypi/simple/

安装完成后，执行 llamafactory-cli version，若出现以下提示，则表明安装成功：

三、准备训练数据

训练数据应保存为json文件，文件为：qwen_dataset.json。需要将其放到 autodl-tmp/LLaMA-Factory/data 下。

其内容示例如下：

[{"instruction": "请提取以下内容中的摘要信息","input": "保持身体健康的五个方法：\n\n1. 每天至少饮用8杯水，促进新陈代谢\n2. 每周进行150分钟中等强度运动，如快走或游泳\n3. 保证7-9小时高质量睡眠，避免熬夜\n4. 饮食中增加蔬菜水果比例，减少油炸食品\n5. 定期体检，监测血压、血糖等指标","output": "多喝水、规律运动、充足睡眠、均衡饮食、定期体检"},{"instruction": "请提取以下内容中的摘要信息","input": "提高学习效率的三个技巧：\n\n1. 使用番茄工作法，每25分钟专注后休息5分钟\n2. 建立思维导图整理知识框架\n3. 睡前复习重点内容加强记忆","output": "番茄工作法、思维导图、睡前复习"},{"instruction": "请提取以下内容中的摘要信息","input": "旅行必备物品清单：\n1. 护照/身份证原件及复印件\n2. 便携充电宝和转换插头\n3. 常用药品（退烧药、创可贴）\n4. 轻便折叠雨伞\n5. 分装洗漱用品","output": "证件、充电设备、药品、雨具、洗漱包"},{"instruction": "请提取以下内容中的摘要信息","input": "职场沟通四大原则：\n① 明确沟通目标\n② 使用金字塔表达结构\n③ 注意非语言信号（眼神/姿态）\n④ 及时确认信息理解度","output": "目标明确、结构化表达、非语言交流、信息确认"},]

在LLaMA-Factory文件夹下的data/dataset_info.json文件中注册自定义的训练数据，在文件中添加如下配置信息：

"qwen_dataset": {"file_name": "qwen_dataset.json"},

四、模型训练

1. 模型下载

安装modelscope

pip install modelscope

下载Qwen2.5

mkdir -p /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B cd /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B # 下载模型# modelscope download --model Qwen/Qwen2.5-7B --local_dir ./# 因为7B模型下载太慢，并且微调所占显存也大，所以用1.8B模型来演示 modelscope download --model Qwen/Qwen2.5-1.5B --local_dir ./

2. 全量微调

在LLaMA-Factory文件夹下，创建 qwen2.5-7b-full-sft.yaml 配置文件，用于设置全量参数训练的配置。

### 模型配置# 预训练模型的本地路径或HuggingFace模型ID model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B # 必须开启以加载包含自定义代码的模型（如Qwen/ChatGLM等） trust_remote_code: true ### 方法配置# 微调阶段：监督式微调 (Supervised Fine-Tuning) stage: sft # 是否执行训练阶段 do_train: true # 微调类型：全参数微调（可选值：full/lora/qlora） finetuning_type: full # DeepSpeed配置文件路径（使用ZeRO Stage 3优化策略） deepspeed: /root/autodl-tmp/LLaMA-Factory/examples/deepspeed/ds_z3_config.json ### 数据集配置# 使用的数据集名称（需与data目录下的数据集名称对应） dataset: qwen_dataset # 使用的模板格式（与模型架构匹配） template: qwen # 输入序列最大长度（单位：token） cutoff_len: 1024 # 是否覆盖已有的缓存文件（建议数据集修改后启用） overwrite_cache: true # 数据预处理的并行进程数（建议设置为CPU核心数的50-70%） preprocessing_num_workers: 16 ### 输出配置# 模型和日志的输出目录 output_dir: saves/qwen2.5-7b/full # 每隔多少训练步记录一次日志 logging_steps: 10 # 每隔多少训练步保存一次模型 save_steps: 100 # 是否生成训练损失曲线图 plot_loss: true # 是否覆盖已有输出目录（建议新训练时启用） overwrite_output_dir: true ### 训练参数# 每个GPU的批次大小（实际batch_size = 此值 * gradient_accumulation_steps * GPU数量） per_device_train_batch_size: 1 # 梯度累积步数（用于模拟更大batch_size） gradient_accumulation_steps: 16 # 初始学习率（适合7B级别模型的典型值） learning_rate: 1.0e-5 # 训练总轮数 num_train_epochs: 1.0 # 学习率调度策略（余弦退火） lr_scheduler_type: cosine # 学习率预热比例（前10%的step用于线性预热） warmup_ratio: 0.1 # 启用BF16混合精度训练（需要Ampere架构以上GPU） bf16: true # 分布式训练超时时间（单位：毫秒） ddp_timeout: 180000000 # 约50小时### 评估配置# 验证集划分比例（从训练集划分） val_size: 0.1 # 评估时每个GPU的批次大小 per_device_eval_batch_size: 1 # 评估策略（按训练步数间隔评估） eval_strategy: steps # 每隔多少训练步执行一次评估 eval_steps: 500

deepspeed的配置：

{// 全局训练批次大小（自动计算为：micro_batch * gpu_num * gradient_accumulation） "train_batch_size": "auto",// 单GPU的微批次大小（根据显存自动调整） "train_micro_batch_size_per_gpu": "auto",// 梯度累积步数（自动匹配micro_batch配置） "gradient_accumulation_steps": "auto",// 梯度裁剪阈值（自动禁用或设置默认1.0） "gradient_clipping": "auto",// 允许未经官方测试的优化器（需谨慎开启） "zero_allow_untested_optimizer": true,// FP16混合精度配置 "fp16": {"enabled": "auto",// 自动根据硬件兼容性启用 "loss_scale": 0,// 动态损失缩放（0表示自动调整） "loss_scale_window": 1000,// 缩放调整窗口大小（1000次迭代） "initial_scale_power": 16,// 初始缩放比例2^16 "hysteresis": 2,// 缩放容差（防止频繁调整） "min_loss_scale": 1 // 最小缩放比例 },// BF16混合精度配置（与FP16二选一） "bf16": {"enabled": "auto"// 在支持BF16的GPU上自动启用 },// ZeRO优化策略（Stage3完整配置） "zero_optimization": {"stage": 3,// 最高优化等级（参数/梯度/优化器状态分片） // 优化器状态卸载到CPU "offload_optimizer": {"device": "cpu",// 卸载到CPU内存 "pin_memory": true // 使用锁页内存加速传输 },// 模型参数卸载到CPU "offload_param": {"device": "cpu",// 参数存储到CPU内存 "pin_memory": true // 使用DMA加速数据传输 },"overlap_comm": false,// 禁用通信计算重叠（提升稳定性） "contiguous_gradients": true,// 保持梯度内存连续（优化显存） // 参数分组配置 "sub_group_size": 1e9,// 单参数组最大尺寸（默认1B防止分组） // 通信缓冲区自动调整 "reduce_bucket_size": "auto",// AllReduce缓冲区大小 "stage3_prefetch_bucket_size": "auto",// 参数预取缓冲区 // 参数持久化阈值 "stage3_param_persistence_threshold": "auto",// 参数驻留GPU的阈值 "stage3_max_live_parameters": 1e9,// 最大驻留参数数量 "stage3_max_reuse_distance": 1e9,// 参数重用距离阈值 // 模型保存时收集16位权重 "stage3_gather_16bit_weights_on_model_save": true }}

开始训练：
切换到qwen2.5-7b-full-sft.yaml所在的路径，执行下面的命令。

# 强制使用torchrun进行分布式训练初始化（适用于多GPU/TPU环境）# 环境变量说明：# - FORCE_TORCHRUN=1 : 强制使用PyTorch的torchrun命令来启动分布式训练# （当自动检测失败或需要显式控制分布式训练时使用）# （需确保已正确安装torch>=1.8.0）# 执行LLaMA Factory训练流程# 命令结构：# llamafactory-cli : 主程序入口（基于Python Fire的CLI工具）# train : 子命令，指定执行训练任务# qwen2.5-7b-full-sft.yaml : 训练配置文件路径（包含模型/数据/训练参数） FORCE_TORCHRUN=1 llamafactory-cli train qwen2.5-7b-full-sft.yaml

训练结果：

[INFO|trainer.py:2519] 2025-11-15 00:36:01,373 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:36:01,373 >> Num examples = 54 [INFO|trainer.py:2521] 2025-11-15 00:36:01,373 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:36:01,373 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:36:01,373 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:36:01,373 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:36:01,373 >> Total optimization steps = 2 [INFO|trainer.py:2528] 2025-11-15 00:36:01,374 >> Number of trainable parameters = 1,543,714,304 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:24<00:00, 11.65s/it][INFO|trainer.py:4309] 2025-11-15 00:36:28,898 >> Saving model checkpoint to saves/qwen2.5-7b/full/checkpoint-2 [INFO|configuration_utils.py:491] 2025-11-15 00:36:28,901 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/config.json [INFO|configuration_utils.py:757] 2025-11-15 00:36:28,902 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/generation_config.json [INFO|modeling_utils.py:4181] 2025-11-15 00:36:33,246 >> Model weights saved in saves/qwen2.5-7b/full/checkpoint-2/model.safetensors [INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:33,247 >> chat template saved in saves/qwen2.5-7b/full/checkpoint-2/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:33,248 >> tokenizer config file saved in saves/qwen2.5-7b/full/checkpoint-2/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:33,248 >> Special tokens file saved in saves/qwen2.5-7b/full/checkpoint-2/special_tokens_map.json [2025-11-15 00:36:33,422][INFO][logging.py:107:log_dist][Rank 0][Torch] Checkpoint global_step2 is about to be saved! [2025-11-15 00:36:33,428][INFO][logging.py:107:log_dist][Rank 0] Saving model checkpoint: saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt [2025-11-15 00:36:33,428][INFO][torch_checkpoint_engine.py:21:save][Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt...[2025-11-15 00:36:33,438][INFO][torch_checkpoint_engine.py:23:save][Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt.[2025-11-15 00:36:33,439][INFO][torch_checkpoint_engine.py:21:save][Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...[2025-11-15 00:36:47,668][INFO][torch_checkpoint_engine.py:23:save][Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.[2025-11-15 00:36:47,669][INFO][engine.py:3701:_save_zero_checkpoint] zero checkpoint saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2025-11-15 00:36:47,673][INFO][torch_checkpoint_engine.py:33:commit][Torch] Checkpoint global_step2 is ready now! [INFO|trainer.py:2810] 2025-11-15 00:36:47,675 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 46.3009,'train_samples_per_second': 1.166,'train_steps_per_second': 0.043,'train_loss': 3.873927593231201,'epoch': 1.0} 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:46<00:00, 23.15s/it][INFO|trainer.py:4309] 2025-11-15 00:36:49,886 >> Saving model checkpoint to saves/qwen2.5-7b/full [INFO|configuration_utils.py:491] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/config.json [INFO|configuration_utils.py:757] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/generation_config.json [INFO|modeling_utils.py:4181] 2025-11-15 00:36:52,910 >> Model weights saved in saves/qwen2.5-7b/full/model.safetensors [INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:52,910 >> chat template saved in saves/qwen2.5-7b/full/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:52,911 >> tokenizer config file saved in saves/qwen2.5-7b/full/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:52,911 >> Special tokens file saved in saves/qwen2.5-7b/full/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 4GF train_loss = 3.8739 train_runtime = 0:00:46.30 train_samples_per_second = 1.166 train_steps_per_second = 0.043 [WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:36:53,090 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:36:53,090 >> Num examples = 7 [INFO|trainer.py:4648] 2025-11-15 00:36:53,090 >> Batch size = 1 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 9.30it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 3.5243 eval_runtime = 0:00:00.71 eval_samples_per_second = 9.774 eval_steps_per_second = 5.585 [INFO|modelcard.py:456] 2025-11-15 00:36:53,806 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}

3.lora微调

在LLaMA-Factory文件夹下，创建 qwen2.5-7b-lora-sft.yaml 配置文件，用于设置lora微调的配置。

### 模型配置# 预训练模型的本地路径或HuggingFace模型ID model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B # 必须开启以加载包含自定义代码的模型（如Qwen/ChatGLM等） trust_remote_code: true ### 训练方法# 训练阶段：监督式微调（Supervised Fine-Tuning） stage: sft # 是否启用训练模式 do_train: true # 微调类型：LoRA（低秩适配） finetuning_type: lora # LoRA作用的目标层（all表示所有线性层） lora_target: all # LoRA的秩（矩阵分解维度） lora_rank: 16 # LoRA的α值（缩放因子，通常等于rank） lora_alpha: 16 # LoRA层的dropout率（防止过拟合） lora_dropout: 0.05 ### 数据集配置# 使用的数据集名称（对应data目录下的数据集文件夹） dataset: alpaca_zh_demo # 使用的模板格式（需与模型匹配，如qwen/llama/chatglm） template: qwen # 输入序列最大长度（单位：token） cutoff_len: 1024 # 是否覆盖已有的预处理缓存 overwrite_cache: true # 数据预处理的并行进程数（建议设置为CPU核心数的50-70%） preprocessing_num_workers: 16 ### 输出配置# 模型和日志的输出目录 output_dir: saves/qwen2.5-7b/lora/sft # 每隔100训练步记录一次日志 logging_steps: 100 # 每隔100训练步保存一次模型 save_steps: 100 # 是否生成训练损失曲线图 plot_loss: true # 是否覆盖已有输出目录（新训练时建议开启） overwrite_output_dir: true ### 训练参数# 每个GPU的批次大小（实际总batch_size = 此值 * gradient_accumulation_steps * GPU数） per_device_train_batch_size: 1 # 梯度累积步数（用于模拟更大batch_size，此处等效总batch_size=16*GPU数） gradient_accumulation_steps: 16 # 初始学习率（LoRA微调的典型学习率范围：1e-4 ~ 5e-4） learning_rate: 1.0e-4 # 训练总轮数 num_train_epochs: 1.0 # 学习率调度策略（余弦退火） lr_scheduler_type: cosine # 学习率预热比例（前10%的step用于线性预热） warmup_ratio: 0.1 # 启用BF16混合精度（需Ampere架构以上GPU，如A100/3090） bf16: true # 分布式训练超时时间（单位：毫秒，此处约50小时） ddp_timeout: 180000000 ### 评估配置# 验证集划分比例（从训练集划分10%作为验证集） val_size: 0.1 # 评估时每个GPU的批次大小 per_device_eval_batch_size: 1 # 评估策略：按训练步数间隔评估 eval_strategy: steps # 每隔500训练步执行一次验证 eval_steps: 500

开始训练：

# llamafactory-cli : 主程序入口# train : 子命令，指定执行训练任务# qwen2.5-7b-lora-sft.yaml : YAML格式的配置文件路径（包含完整的训练参数） llamafactory-cli train qwen2.5-7b-lora-sft.yaml

训练结果为：

[INFO|trainer.py:2519] 2025-11-15 00:39:43,504 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:39:43,504 >> Num examples = 900 [INFO|trainer.py:2521] 2025-11-15 00:39:43,504 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:39:43,504 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:39:43,504 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:39:43,504 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:39:43,504 >> Total optimization steps = 29 [INFO|trainer.py:2528] 2025-11-15 00:39:43,507 >> Number of trainable parameters = 18,464,768 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 1.99s/it][INFO|trainer.py:4309] 2025-11-15 00:41:02,102 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft/checkpoint-29 [INFO|configuration_utils.py:763] 2025-11-15 00:41:02,121 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:41:02,122 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,220 >> chat template saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,220 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,221 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/special_tokens_map.json [INFO|trainer.py:2810] 2025-11-15 00:41:02,515 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 79.008,'train_samples_per_second': 11.391,'train_steps_per_second': 0.367,'train_loss': 1.657024120462352,'epoch': 1.0} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 2.72s/it][INFO|trainer.py:4309] 2025-11-15 00:41:02,518 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft [INFO|configuration_utils.py:763] 2025-11-15 00:41:02,537 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:41:02,538 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,611 >> chat template saved in saves/qwen2.5-7b/lora/sft/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,611 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,611 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 1054627GF train_loss = 1.657 train_runtime = 0:01:19.00 train_samples_per_second = 11.391 train_steps_per_second = 0.367 [WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:41:02,752 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:41:02,752 >> Num examples = 100 [INFO|trainer.py:4648] 2025-11-15 00:41:02,752 >> Batch size = 1 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.55it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 1.6728 eval_runtime = 0:00:01.62 eval_samples_per_second = 61.354 eval_steps_per_second = 30.677 [INFO|modelcard.py:456] 2025-11-15 00:41:04,381 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}

4.QLora微调

在LLaMA-Factory文件夹下，创建 qwen2.5-7b-qlora-sft.yaml 配置文件，用于设置qlora微调的配置。

### 模型配置# 预训练模型的本地路径或HuggingFace模型ID（需确保路径正确） model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B # 必须开启以加载包含自定义代码的模型（如Qwen/ChatGLM等） trust_remote_code: true ### 训练方法# 训练阶段：监督式微调（Supervised Fine-Tuning） stage: sft # 是否启用训练模式 do_train: true # 微调类型：QLoRA（量化低秩适配） finetuning_type: lora # QLoRA作用的目标层（all表示所有线性层） lora_target: all # 量化位数（4-bit量化） quantization_bit: 4 # 量化方法（使用bitsandbytes库实现） quantization_method: bitsandbytes # QLoRA的秩（矩阵分解维度） lora_rank: 16 # QLoRA的α值（缩放因子，通常等于rank） lora_alpha: 16 # QLoRA层的dropout率（防止过拟合） lora_dropout: 0.05 ### 数据集配置# 使用的数据集名称（对应data目录下的数据集文件夹） dataset: alpaca_zh_demo # 使用的模板格式（需与模型架构匹配） template: qwen # 输入序列最大长度（单位：token） cutoff_len: 1024 # 是否覆盖已有的预处理缓存（数据集修改后需启用） overwrite_cache: true # 数据预处理的并行进程数（建议设置为CPU核心数的50-70%） preprocessing_num_workers: 16 ### 输出配置# 模型和日志的输出目录（QLoRA检查点保存路径） output_dir: saves/qwen2.5-7b/qlora/sft # 每隔100训练步记录一次日志 logging_steps: 100 # 每隔100训练步保存一次模型 save_steps: 100 # 是否生成训练损失曲线图（保存在output_dir/loss.png） plot_loss: true # 是否覆盖已有输出目录（新训练时建议开启） overwrite_output_dir: true ### 训练参数# 每个GPU的批次大小（实际总batch_size = 此值 * gradient_accumulation_steps * GPU数） per_device_train_batch_size: 1 # 梯度累积步数（用于模拟更大batch_size，此处等效总batch_size=16*GPU数） gradient_accumulation_steps: 16 # 初始学习率（QLoRA典型学习率范围：1e-4 ~ 5e-4） learning_rate: 1.0e-4 # 训练总轮数 num_train_epochs: 1.0 # 学习率调度策略（余弦退火） lr_scheduler_type: cosine # 学习率预热比例（前10%的step用于线性预热） warmup_ratio: 0.1 # 启用BF16混合精度（需Ampere架构以上GPU，如A100/3090） bf16: true # 分布式训练超时时间（单位：毫秒，此处约50小时） ddp_timeout: 180000000 ### 评估配置# 验证集划分比例（从训练集划分10%作为验证集） val_size: 0.1 # 评估时每个GPU的批次大小 per_device_eval_batch_size: 1 # 评估策略：按训练步数间隔评估 eval_strategy: steps # 每隔500训练步执行一次验证 eval_steps: 500

QLoRA训练：

# llamafactory-cli : 主程序入口# train : 子命令，指定执行训练任务# qwen2.5-7b-lora-sft.yaml : YAML格式的配置文件路径（包含完整的训练参数） llamafactory-cli train qwen2.5-7b-qlora-sft.yaml

训练结果如下：

[INFO|trainer.py:2519] 2025-11-15 00:43:46,249 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:43:46,249 >> Num examples = 900 [INFO|trainer.py:2521] 2025-11-15 00:43:46,249 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:43:46,249 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:43:46,249 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:43:46,249 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:43:46,249 >> Total optimization steps = 29 [INFO|trainer.py:2528] 2025-11-15 00:43:46,254 >> Number of trainable parameters = 18,464,768 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 1.98s/it][INFO|trainer.py:4309] 2025-11-15 00:45:06,653 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft/checkpoint-29 [INFO|configuration_utils.py:763] 2025-11-15 00:45:06,673 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:45:06,674 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:06,761 >> chat template saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:06,761 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:06,761 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/special_tokens_map.json [INFO|trainer.py:2810] 2025-11-15 00:45:07,051 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 80.7972,'train_samples_per_second': 11.139,'train_steps_per_second': 0.359,'train_loss': 1.6571868370319236,'epoch': 1.0} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 2.78s/it][INFO|trainer.py:4309] 2025-11-15 00:45:07,054 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft [INFO|configuration_utils.py:763] 2025-11-15 00:45:07,073 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:45:07,074 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:07,136 >> chat template saved in saves/qwen2.5-7b/qlora/sft/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:07,136 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:07,137 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 1054627GF train_loss = 1.6572 train_runtime = 0:01:20.79 train_samples_per_second = 11.139 train_steps_per_second = 0.359 [WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:45:07,276 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:45:07,277 >> Num examples = 100 [INFO|trainer.py:4648] 2025-11-15 00:45:07,277 >> Batch size = 1 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.85it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 1.6738 eval_runtime = 0:00:01.61 eval_samples_per_second = 61.919 eval_steps_per_second = 30.96 [INFO|modelcard.py:456] 2025-11-15 00:45:08,890 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}

使用上述训练配置，各个方法实测的显存占用如下。训练中的显存占用与训练参数配置息息相关，可根据自身实际需求进行设置。

全量参数训练：42.18GB
LoRA训练：20.17GB
QLoRA训练: 10.97GB

五、合并模型权重

1.模型合并

如果采用LoRA或者QLoRA进行训练，脚本只保存对应的LoRA权重，需要合并权重才能进行推理。全量参数训练无需执行此步骤。下面将LoRA微调的权重和预训练模型进行合并。注意：如果是QLoRA微调的权重需要和使用NF4方式量化后的预训练模型进行合并。

微调的命令如下：

llamafactory-cli export qwen2.5-7b-merge-lora.yaml

其中 qwen2.5-7b-merge-lora.yaml 中配置如下：

### model model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B adapter_name_or_path: /root/autodl-tmp/LLaMA-Factory/saves/qwen2.5-7b/lora/sft template: qwen finetuning_type: lora trust_remote_code: true // 必须开启 ### export export_dir: /root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged export_size: 2 export_device: cpu export_legacy_format: false

权重合并的部分参数说明：

参数	说明
model_name_or_path	预训练模型的名称或路径
template	模型类型
export_dir	导出路径
export_size	最大导出模型文件大小
export_device	导出设备
export_legacy_format	是否使用旧格式导出

注意：

合并Qwen2.5模型权重，务必将template设为qwen；无论LoRA还是QLoRA训练，合并权重时，finetuning_type均为lora。
adapter_name_or_path需要与微调中的适配器输出路径output_dir相对应。

2.测试

inference.py 文件内容如下：

import time from transformers import AutoModelForCausalLM, AutoTokenizer # 加载 tokenizer 和 model tokenizer = AutoTokenizer.from_pretrained("/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged", trust_remote_code=True ) model = AutoModelForCausalLM.from_pretrained("/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged", device_map="auto", trust_remote_code=True ).eval() prompt = "你好" inputs = tokenizer(prompt, return_tensors="pt").to(model.device)# 记录生成开始时间 start_time = time.time()# 使用 generate 生成文本 outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.3, top_p=0.4 )# 记录生成结束时间 end_time = time.time()# 解码输出 response = tokenizer.decode(outputs[0], skip_special_tokens=True) print("生成结果：", response)# 统计生成速度 num_generated_tokens = outputs.shape[1]- inputs['input_ids'].shape[1]# 新生成的token数量 elapsed_time = end_time - start_time tokens_per_second = num_generated_tokens / elapsed_time if elapsed_time > 0 else 0 print(f"生成了 {num_generated_tokens} 个 token，用时 {elapsed_time:.2f} 秒，速度约为 {tokens_per_second:.2f} token/s")

结果如下：

生成结果： 你好，我有一个问题想问。 您好，请问有什么问题需要帮助吗？ 我最近感到很焦虑，有什么方法可以缓解吗？ 焦虑是一种常见的心理问题，您可以尝试进行深呼吸、冥想、运动、与朋友聊天等方式来缓解焦虑。同时，也可以考虑寻求专业心理咨询师的帮助。 生成了 64 个 token，用时 2.17 秒，速度约为 29.46 token/s

LLaMA-Factory使用

Ne0inhk

文章目录

一、LLAMA-Factory简介

二、安装LLaMA-Factory

三、准备训练数据

四、模型训练

1. 模型下载

2. 全量微调

3.lora微调

4.QLora微调

五、合并模型权重

1.模型合并

2.测试

Read more

解锁 C++ std::map 的力量

【C++】多态

【探寻C++之旅】C++ 智能指针完全指南：从原理到实战，彻底告别内存泄漏

【C++掌中宝】类和对象（二）：隐藏的this指针

文章目录

一、LLAMA-Factory简介

二、安装LLaMA-Factory

三、准备训练数据

四、模型训练

1. 模型下载

2. 全量微调

3.lora微调

4.QLora微调

五、合并模型权重

1.模型合并

2.测试

Read more

**解锁 C++ std::map 的力量**

【C++】多态

【探寻C++之旅】C++ 智能指针完全指南：从原理到实战，彻底告别内存泄漏

【C++掌中宝】类和对象（二）：隐藏的this指针

解锁 C++ std::map 的力量