LLaMA-Factory使用

LLaMA-Factory使用

文章目录

一、LLAMA-Factory简介

LLaMA-Factory是一个简单易用且高效的大模型训练框架,支持上百种大模型的训练,框架特性主要包括:

  • 模型种类:LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen、Yi、Gemma、Baichuan、ChatGLM、Phi 等等。
  • 训练算法:(增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等等。
  • 运算精度:16比特全参数微调、冻结微调、LoRA微调和基于AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ的⅔/⅘/6/8比特QLoRA 微调。
  • 优化算法:GaLore、BAdam、DoRA、LongLoRA、LLaMA Pro、Mixture-of-Depths、LoRA+、LoftQ和PiSSA。
  • 加速算子:FlashAttention-2和Unsloth。
  • 推理引擎:Transformers和vLLM。
  • 实验面板:LlamaBoard、TensorBoard、Wandb、MLflow等等。

本文将介绍如何使用LLAMA-Factory对Qwen2.5系列大模型进行微调(Qwen1.5系列模型也适用),更多特性请参考https://github.com/hiyouga/LlamaFactory

二、安装LLaMA-Factory

LLaMA-Factory的github地址为:https://github.com/hiyouga/LLaMA-Factory 。为防止项目更新带来软件版本不适配,我们下面安装一个历史版本。

  • 在使用AutoDL克隆git仓库时,速度较慢,可以运行如下命令。
source /etc/network_turbo
  • 下载并安装LLaMA-Factory:
cd /root/autodl-tmp
git clone --depth 1 https://github.com/Jiangnanjiezi/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e “.[torch,metrics]” -i https://mirrors.aliyun.com/pypi/simple/
  • 安装完成后,执行 llamafactory-cli version,若出现以下提示,则表明安装成功:

三、准备训练数据

训练数据应保存为json文件,文件为:qwen_dataset.json。需要将其放到 autodl-tmp/LLaMA-Factory/data 下。

其内容示例如下:

[{"instruction": "请提取以下内容中的摘要信息","input": "保持身体健康的五个方法:\n\n1. 每天至少饮用8杯水,促进新陈代谢\n2. 每周进行150分钟中等强度运动,如快走或游泳\n3. 保证7-9小时高质量睡眠,避免熬夜\n4. 饮食中增加蔬菜水果比例,减少油炸食品\n5. 定期体检,监测血压、血糖等指标","output": "多喝水、规律运动、充足睡眠、均衡饮食、定期体检"},{"instruction": "请提取以下内容中的摘要信息","input": "提高学习效率的三个技巧:\n\n1. 使用番茄工作法,每25分钟专注后休息5分钟\n2. 建立思维导图整理知识框架\n3. 睡前复习重点内容加强记忆","output": "番茄工作法、思维导图、睡前复习"},{"instruction": "请提取以下内容中的摘要信息","input": "旅行必备物品清单:\n1. 护照/身份证原件及复印件\n2. 便携充电宝和转换插头\n3. 常用药品(退烧药、创可贴)\n4. 轻便折叠雨伞\n5. 分装洗漱用品","output": "证件、充电设备、药品、雨具、洗漱包"},{"instruction": "请提取以下内容中的摘要信息","input": "职场沟通四大原则:\n① 明确沟通目标\n② 使用金字塔表达结构\n③ 注意非语言信号(眼神/姿态)\n④ 及时确认信息理解度","output": "目标明确、结构化表达、非语言交流、信息确认"},]

在LLaMA-Factory文件夹下的data/dataset_info.json文件中注册自定义的训练数据,在文件中添加如下配置信息:

"qwen_dataset": {"file_name": "qwen_dataset.json"},

四、模型训练

1. 模型下载

  • 安装modelscope
pip install modelscope
  • 下载Qwen2.5
mkdir -p /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B cd /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B # 下载模型# modelscope download --model Qwen/Qwen2.5-7B --local_dir ./# 因为7B模型下载太慢,并且微调所占显存也大,所以用1.8B模型来演示 modelscope download --model Qwen/Qwen2.5-1.5B --local_dir ./

2. 全量微调

在LLaMA-Factory文件夹下,创建 qwen2.5-7b-full-sft.yaml 配置文件,用于设置全量参数训练的配置。

### 模型配置# 预训练模型的本地路径或HuggingFace模型ID model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B # 必须开启以加载包含自定义代码的模型(如Qwen/ChatGLM等) trust_remote_code: true ### 方法配置# 微调阶段:监督式微调 (Supervised Fine-Tuning) stage: sft # 是否执行训练阶段 do_train: true # 微调类型:全参数微调(可选值:full/lora/qlora) finetuning_type: full # DeepSpeed配置文件路径(使用ZeRO Stage 3优化策略) deepspeed: /root/autodl-tmp/LLaMA-Factory/examples/deepspeed/ds_z3_config.json ### 数据集配置# 使用的数据集名称(需与data目录下的数据集名称对应) dataset: qwen_dataset # 使用的模板格式(与模型架构匹配) template: qwen # 输入序列最大长度(单位:token) cutoff_len: 1024 # 是否覆盖已有的缓存文件(建议数据集修改后启用) overwrite_cache: true # 数据预处理的并行进程数(建议设置为CPU核心数的50-70%) preprocessing_num_workers: 16 ### 输出配置# 模型和日志的输出目录 output_dir: saves/qwen2.5-7b/full # 每隔多少训练步记录一次日志 logging_steps: 10 # 每隔多少训练步保存一次模型 save_steps: 100 # 是否生成训练损失曲线图 plot_loss: true # 是否覆盖已有输出目录(建议新训练时启用) overwrite_output_dir: true ### 训练参数# 每个GPU的批次大小(实际batch_size = 此值 * gradient_accumulation_steps * GPU数量) per_device_train_batch_size: 1 # 梯度累积步数(用于模拟更大batch_size) gradient_accumulation_steps: 16 # 初始学习率(适合7B级别模型的典型值) learning_rate: 1.0e-5 # 训练总轮数 num_train_epochs: 1.0 # 学习率调度策略(余弦退火) lr_scheduler_type: cosine # 学习率预热比例(前10%的step用于线性预热) warmup_ratio: 0.1 # 启用BF16混合精度训练(需要Ampere架构以上GPU) bf16: true # 分布式训练超时时间(单位:毫秒) ddp_timeout: 180000000 # 约50小时### 评估配置# 验证集划分比例(从训练集划分) val_size: 0.1 # 评估时每个GPU的批次大小 per_device_eval_batch_size: 1 # 评估策略(按训练步数间隔评估) eval_strategy: steps # 每隔多少训练步执行一次评估 eval_steps: 500 

deepspeed的配置:

{// 全局训练批次大小(自动计算为:micro_batch * gpu_num * gradient_accumulation) "train_batch_size": "auto",// 单GPU的微批次大小(根据显存自动调整) "train_micro_batch_size_per_gpu": "auto",// 梯度累积步数(自动匹配micro_batch配置) "gradient_accumulation_steps": "auto",// 梯度裁剪阈值(自动禁用或设置默认1.0) "gradient_clipping": "auto",// 允许未经官方测试的优化器(需谨慎开启) "zero_allow_untested_optimizer": true,// FP16混合精度配置 "fp16": {"enabled": "auto",// 自动根据硬件兼容性启用 "loss_scale": 0,// 动态损失缩放(0表示自动调整) "loss_scale_window": 1000,// 缩放调整窗口大小(1000次迭代) "initial_scale_power": 16,// 初始缩放比例2^16 "hysteresis": 2,// 缩放容差(防止频繁调整) "min_loss_scale": 1 // 最小缩放比例 },// BF16混合精度配置(与FP16二选一) "bf16": {"enabled": "auto"// 在支持BF16的GPU上自动启用 },// ZeRO优化策略(Stage3完整配置) "zero_optimization": {"stage": 3,// 最高优化等级(参数/梯度/优化器状态分片) // 优化器状态卸载到CPU "offload_optimizer": {"device": "cpu",// 卸载到CPU内存 "pin_memory": true // 使用锁页内存加速传输 },// 模型参数卸载到CPU "offload_param": {"device": "cpu",// 参数存储到CPU内存 "pin_memory": true // 使用DMA加速数据传输 },"overlap_comm": false,// 禁用通信计算重叠(提升稳定性) "contiguous_gradients": true,// 保持梯度内存连续(优化显存) // 参数分组配置 "sub_group_size": 1e9,// 单参数组最大尺寸(默认1B防止分组) // 通信缓冲区自动调整 "reduce_bucket_size": "auto",// AllReduce缓冲区大小 "stage3_prefetch_bucket_size": "auto",// 参数预取缓冲区 // 参数持久化阈值 "stage3_param_persistence_threshold": "auto",// 参数驻留GPU的阈值 "stage3_max_live_parameters": 1e9,// 最大驻留参数数量 "stage3_max_reuse_distance": 1e9,// 参数重用距离阈值 // 模型保存时收集16位权重 "stage3_gather_16bit_weights_on_model_save": true }}

开始训练:
切换到qwen2.5-7b-full-sft.yaml所在的路径,执行下面的命令。

# 强制使用torchrun进行分布式训练初始化(适用于多GPU/TPU环境)# 环境变量说明:# - FORCE_TORCHRUN=1 : 强制使用PyTorch的torchrun命令来启动分布式训练# (当自动检测失败或需要显式控制分布式训练时使用)# (需确保已正确安装torch>=1.8.0)# 执行LLaMA Factory训练流程# 命令结构:# llamafactory-cli : 主程序入口(基于Python Fire的CLI工具)# train : 子命令,指定执行训练任务# qwen2.5-7b-full-sft.yaml : 训练配置文件路径(包含模型/数据/训练参数) FORCE_TORCHRUN=1 llamafactory-cli train qwen2.5-7b-full-sft.yaml 

训练结果:

[INFO|trainer.py:2519] 2025-11-15 00:36:01,373 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:36:01,373 >> Num examples = 54 [INFO|trainer.py:2521] 2025-11-15 00:36:01,373 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:36:01,373 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:36:01,373 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:36:01,373 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:36:01,373 >> Total optimization steps = 2 [INFO|trainer.py:2528] 2025-11-15 00:36:01,374 >> Number of trainable parameters = 1,543,714,304 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:24<00:00, 11.65s/it][INFO|trainer.py:4309] 2025-11-15 00:36:28,898 >> Saving model checkpoint to saves/qwen2.5-7b/full/checkpoint-2 [INFO|configuration_utils.py:491] 2025-11-15 00:36:28,901 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/config.json [INFO|configuration_utils.py:757] 2025-11-15 00:36:28,902 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/generation_config.json [INFO|modeling_utils.py:4181] 2025-11-15 00:36:33,246 >> Model weights saved in saves/qwen2.5-7b/full/checkpoint-2/model.safetensors [INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:33,247 >> chat template saved in saves/qwen2.5-7b/full/checkpoint-2/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:33,248 >> tokenizer config file saved in saves/qwen2.5-7b/full/checkpoint-2/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:33,248 >> Special tokens file saved in saves/qwen2.5-7b/full/checkpoint-2/special_tokens_map.json [2025-11-15 00:36:33,422][INFO][logging.py:107:log_dist][Rank 0][Torch] Checkpoint global_step2 is about to be saved! [2025-11-15 00:36:33,428][INFO][logging.py:107:log_dist][Rank 0] Saving model checkpoint: saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt [2025-11-15 00:36:33,428][INFO][torch_checkpoint_engine.py:21:save][Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt...[2025-11-15 00:36:33,438][INFO][torch_checkpoint_engine.py:23:save][Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt.[2025-11-15 00:36:33,439][INFO][torch_checkpoint_engine.py:21:save][Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...[2025-11-15 00:36:47,668][INFO][torch_checkpoint_engine.py:23:save][Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.[2025-11-15 00:36:47,669][INFO][engine.py:3701:_save_zero_checkpoint] zero checkpoint saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2025-11-15 00:36:47,673][INFO][torch_checkpoint_engine.py:33:commit][Torch] Checkpoint global_step2 is ready now! [INFO|trainer.py:2810] 2025-11-15 00:36:47,675 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 46.3009,'train_samples_per_second': 1.166,'train_steps_per_second': 0.043,'train_loss': 3.873927593231201,'epoch': 1.0} 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:46<00:00, 23.15s/it][INFO|trainer.py:4309] 2025-11-15 00:36:49,886 >> Saving model checkpoint to saves/qwen2.5-7b/full [INFO|configuration_utils.py:491] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/config.json [INFO|configuration_utils.py:757] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/generation_config.json [INFO|modeling_utils.py:4181] 2025-11-15 00:36:52,910 >> Model weights saved in saves/qwen2.5-7b/full/model.safetensors [INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:52,910 >> chat template saved in saves/qwen2.5-7b/full/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:52,911 >> tokenizer config file saved in saves/qwen2.5-7b/full/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:52,911 >> Special tokens file saved in saves/qwen2.5-7b/full/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 4GF train_loss = 3.8739 train_runtime = 0:00:46.30 train_samples_per_second = 1.166 train_steps_per_second = 0.043 [WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:36:53,090 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:36:53,090 >> Num examples = 7 [INFO|trainer.py:4648] 2025-11-15 00:36:53,090 >> Batch size = 1 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 9.30it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 3.5243 eval_runtime = 0:00:00.71 eval_samples_per_second = 9.774 eval_steps_per_second = 5.585 [INFO|modelcard.py:456] 2025-11-15 00:36:53,806 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}

3.lora微调

在LLaMA-Factory文件夹下,创建 qwen2.5-7b-lora-sft.yaml 配置文件,用于设置lora微调的配置。

### 模型配置# 预训练模型的本地路径或HuggingFace模型ID model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B # 必须开启以加载包含自定义代码的模型(如Qwen/ChatGLM等) trust_remote_code: true ### 训练方法# 训练阶段:监督式微调(Supervised Fine-Tuning) stage: sft # 是否启用训练模式 do_train: true # 微调类型:LoRA(低秩适配) finetuning_type: lora # LoRA作用的目标层(all表示所有线性层) lora_target: all # LoRA的秩(矩阵分解维度) lora_rank: 16 # LoRA的α值(缩放因子,通常等于rank) lora_alpha: 16 # LoRA层的dropout率(防止过拟合) lora_dropout: 0.05 ### 数据集配置# 使用的数据集名称(对应data目录下的数据集文件夹) dataset: alpaca_zh_demo # 使用的模板格式(需与模型匹配,如qwen/llama/chatglm) template: qwen # 输入序列最大长度(单位:token) cutoff_len: 1024 # 是否覆盖已有的预处理缓存 overwrite_cache: true # 数据预处理的并行进程数(建议设置为CPU核心数的50-70%) preprocessing_num_workers: 16 ### 输出配置# 模型和日志的输出目录 output_dir: saves/qwen2.5-7b/lora/sft # 每隔100训练步记录一次日志 logging_steps: 100 # 每隔100训练步保存一次模型 save_steps: 100 # 是否生成训练损失曲线图 plot_loss: true # 是否覆盖已有输出目录(新训练时建议开启) overwrite_output_dir: true ### 训练参数# 每个GPU的批次大小(实际总batch_size = 此值 * gradient_accumulation_steps * GPU数) per_device_train_batch_size: 1 # 梯度累积步数(用于模拟更大batch_size,此处等效总batch_size=16*GPU数) gradient_accumulation_steps: 16 # 初始学习率(LoRA微调的典型学习率范围:1e-4 ~ 5e-4) learning_rate: 1.0e-4 # 训练总轮数 num_train_epochs: 1.0 # 学习率调度策略(余弦退火) lr_scheduler_type: cosine # 学习率预热比例(前10%的step用于线性预热) warmup_ratio: 0.1 # 启用BF16混合精度(需Ampere架构以上GPU,如A100/3090) bf16: true # 分布式训练超时时间(单位:毫秒,此处约50小时) ddp_timeout: 180000000 ### 评估配置# 验证集划分比例(从训练集划分10%作为验证集) val_size: 0.1 # 评估时每个GPU的批次大小 per_device_eval_batch_size: 1 # 评估策略:按训练步数间隔评估 eval_strategy: steps # 每隔500训练步执行一次验证 eval_steps: 500 

开始训练:

# llamafactory-cli : 主程序入口# train : 子命令,指定执行训练任务# qwen2.5-7b-lora-sft.yaml : YAML格式的配置文件路径(包含完整的训练参数) llamafactory-cli train qwen2.5-7b-lora-sft.yaml 

训练结果为:

[INFO|trainer.py:2519] 2025-11-15 00:39:43,504 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:39:43,504 >> Num examples = 900 [INFO|trainer.py:2521] 2025-11-15 00:39:43,504 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:39:43,504 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:39:43,504 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:39:43,504 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:39:43,504 >> Total optimization steps = 29 [INFO|trainer.py:2528] 2025-11-15 00:39:43,507 >> Number of trainable parameters = 18,464,768 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 1.99s/it][INFO|trainer.py:4309] 2025-11-15 00:41:02,102 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft/checkpoint-29 [INFO|configuration_utils.py:763] 2025-11-15 00:41:02,121 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:41:02,122 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,220 >> chat template saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,220 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,221 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/special_tokens_map.json [INFO|trainer.py:2810] 2025-11-15 00:41:02,515 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 79.008,'train_samples_per_second': 11.391,'train_steps_per_second': 0.367,'train_loss': 1.657024120462352,'epoch': 1.0} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 2.72s/it][INFO|trainer.py:4309] 2025-11-15 00:41:02,518 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft [INFO|configuration_utils.py:763] 2025-11-15 00:41:02,537 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:41:02,538 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,611 >> chat template saved in saves/qwen2.5-7b/lora/sft/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,611 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,611 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 1054627GF train_loss = 1.657 train_runtime = 0:01:19.00 train_samples_per_second = 11.391 train_steps_per_second = 0.367 [WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:41:02,752 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:41:02,752 >> Num examples = 100 [INFO|trainer.py:4648] 2025-11-15 00:41:02,752 >> Batch size = 1 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.55it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 1.6728 eval_runtime = 0:00:01.62 eval_samples_per_second = 61.354 eval_steps_per_second = 30.677 [INFO|modelcard.py:456] 2025-11-15 00:41:04,381 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}

4.QLora微调

在LLaMA-Factory文件夹下,创建 qwen2.5-7b-qlora-sft.yaml 配置文件,用于设置qlora微调的配置。

### 模型配置# 预训练模型的本地路径或HuggingFace模型ID(需确保路径正确) model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B # 必须开启以加载包含自定义代码的模型(如Qwen/ChatGLM等) trust_remote_code: true ### 训练方法# 训练阶段:监督式微调(Supervised Fine-Tuning) stage: sft # 是否启用训练模式 do_train: true # 微调类型:QLoRA(量化低秩适配) finetuning_type: lora # QLoRA作用的目标层(all表示所有线性层) lora_target: all # 量化位数(4-bit量化) quantization_bit: 4 # 量化方法(使用bitsandbytes库实现) quantization_method: bitsandbytes # QLoRA的秩(矩阵分解维度) lora_rank: 16 # QLoRA的α值(缩放因子,通常等于rank) lora_alpha: 16 # QLoRA层的dropout率(防止过拟合) lora_dropout: 0.05 ### 数据集配置# 使用的数据集名称(对应data目录下的数据集文件夹) dataset: alpaca_zh_demo # 使用的模板格式(需与模型架构匹配) template: qwen # 输入序列最大长度(单位:token) cutoff_len: 1024 # 是否覆盖已有的预处理缓存(数据集修改后需启用) overwrite_cache: true # 数据预处理的并行进程数(建议设置为CPU核心数的50-70%) preprocessing_num_workers: 16 ### 输出配置# 模型和日志的输出目录(QLoRA检查点保存路径) output_dir: saves/qwen2.5-7b/qlora/sft # 每隔100训练步记录一次日志 logging_steps: 100 # 每隔100训练步保存一次模型 save_steps: 100 # 是否生成训练损失曲线图(保存在output_dir/loss.png) plot_loss: true # 是否覆盖已有输出目录(新训练时建议开启) overwrite_output_dir: true ### 训练参数# 每个GPU的批次大小(实际总batch_size = 此值 * gradient_accumulation_steps * GPU数) per_device_train_batch_size: 1 # 梯度累积步数(用于模拟更大batch_size,此处等效总batch_size=16*GPU数) gradient_accumulation_steps: 16 # 初始学习率(QLoRA典型学习率范围:1e-4 ~ 5e-4) learning_rate: 1.0e-4 # 训练总轮数 num_train_epochs: 1.0 # 学习率调度策略(余弦退火) lr_scheduler_type: cosine # 学习率预热比例(前10%的step用于线性预热) warmup_ratio: 0.1 # 启用BF16混合精度(需Ampere架构以上GPU,如A100/3090) bf16: true # 分布式训练超时时间(单位:毫秒,此处约50小时) ddp_timeout: 180000000 ### 评估配置# 验证集划分比例(从训练集划分10%作为验证集) val_size: 0.1 # 评估时每个GPU的批次大小 per_device_eval_batch_size: 1 # 评估策略:按训练步数间隔评估 eval_strategy: steps # 每隔500训练步执行一次验证 eval_steps: 500 

QLoRA训练:

# llamafactory-cli : 主程序入口# train : 子命令,指定执行训练任务# qwen2.5-7b-lora-sft.yaml : YAML格式的配置文件路径(包含完整的训练参数) llamafactory-cli train qwen2.5-7b-qlora-sft.yaml 

训练结果如下:

[INFO|trainer.py:2519] 2025-11-15 00:43:46,249 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:43:46,249 >> Num examples = 900 [INFO|trainer.py:2521] 2025-11-15 00:43:46,249 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:43:46,249 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:43:46,249 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:43:46,249 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:43:46,249 >> Total optimization steps = 29 [INFO|trainer.py:2528] 2025-11-15 00:43:46,254 >> Number of trainable parameters = 18,464,768 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 1.98s/it][INFO|trainer.py:4309] 2025-11-15 00:45:06,653 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft/checkpoint-29 [INFO|configuration_utils.py:763] 2025-11-15 00:45:06,673 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:45:06,674 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:06,761 >> chat template saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:06,761 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:06,761 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/special_tokens_map.json [INFO|trainer.py:2810] 2025-11-15 00:45:07,051 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 80.7972,'train_samples_per_second': 11.139,'train_steps_per_second': 0.359,'train_loss': 1.6571868370319236,'epoch': 1.0} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 2.78s/it][INFO|trainer.py:4309] 2025-11-15 00:45:07,054 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft [INFO|configuration_utils.py:763] 2025-11-15 00:45:07,073 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:45:07,074 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:07,136 >> chat template saved in saves/qwen2.5-7b/qlora/sft/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:07,136 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:07,137 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 1054627GF train_loss = 1.6572 train_runtime = 0:01:20.79 train_samples_per_second = 11.139 train_steps_per_second = 0.359 [WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:45:07,276 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:45:07,277 >> Num examples = 100 [INFO|trainer.py:4648] 2025-11-15 00:45:07,277 >> Batch size = 1 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.85it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 1.6738 eval_runtime = 0:00:01.61 eval_samples_per_second = 61.919 eval_steps_per_second = 30.96 [INFO|modelcard.py:456] 2025-11-15 00:45:08,890 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}

使用上述训练配置,各个方法实测的显存占用如下。训练中的显存占用与训练参数配置息息相关,可根据自身实际需求进行设置。

  • 全量参数训练:42.18GB
  • LoRA训练:20.17GB
  • QLoRA训练: 10.97GB

五、合并模型权重

1.模型合并

如果采用LoRA或者QLoRA进行训练,脚本只保存对应的LoRA权重,需要合并权重才能进行推理。全量参数训练无需执行此步骤。下面将LoRA微调的权重和预训练模型进行合并。注意:如果是QLoRA微调的权重需要和使用NF4方式量化后的预训练模型进行合并。

微调的命令如下:

llamafactory-cli export qwen2.5-7b-merge-lora.yaml 

其中 qwen2.5-7b-merge-lora.yaml 中配置如下:

### model model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B adapter_name_or_path: /root/autodl-tmp/LLaMA-Factory/saves/qwen2.5-7b/lora/sft template: qwen finetuning_type: lora trust_remote_code: true // 必须开启 ### export export_dir: /root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged export_size: 2 export_device: cpu export_legacy_format: false 

权重合并的部分参数说明:

参数说明
model_name_or_path预训练模型的名称或路径
template模型类型
export_dir导出路径
export_size最大导出模型文件大小
export_device导出设备
export_legacy_format是否使用旧格式导出

注意:

  • 合并Qwen2.5模型权重,务必将template设为qwen;无论LoRA还是QLoRA训练,合并权重时,finetuning_type均为lora。
  • adapter_name_or_path需要与微调中的适配器输出路径output_dir相对应。

2.测试

inference.py 文件内容如下:

import time from transformers import AutoModelForCausalLM, AutoTokenizer # 加载 tokenizer 和 model tokenizer = AutoTokenizer.from_pretrained("/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged", trust_remote_code=True ) model = AutoModelForCausalLM.from_pretrained("/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged", device_map="auto", trust_remote_code=True ).eval() prompt = "你好" inputs = tokenizer(prompt, return_tensors="pt").to(model.device)# 记录生成开始时间 start_time = time.time()# 使用 generate 生成文本 outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.3, top_p=0.4 )# 记录生成结束时间 end_time = time.time()# 解码输出 response = tokenizer.decode(outputs[0], skip_special_tokens=True) print("生成结果:", response)# 统计生成速度 num_generated_tokens = outputs.shape[1]- inputs['input_ids'].shape[1]# 新生成的token数量 elapsed_time = end_time - start_time tokens_per_second = num_generated_tokens / elapsed_time if elapsed_time > 0 else 0 print(f"生成了 {num_generated_tokens} 个 token,用时 {elapsed_time:.2f} 秒,速度约为 {tokens_per_second:.2f} token/s")

结果如下:

生成结果: 你好,我有一个问题想问。 您好,请问有什么问题需要帮助吗? 我最近感到很焦虑,有什么方法可以缓解吗? 焦虑是一种常见的心理问题,您可以尝试进行深呼吸、冥想、运动、与朋友聊天等方式来缓解焦虑。同时,也可以考虑寻求专业心理咨询师的帮助。 生成了 64 个 token,用时 2.17 秒,速度约为 29.46 token/s 

Read more

**解锁 C++ std::map 的力量**

**解锁 C++ std::map 的力量**

前言         前几天我们探讨了 C++ 中 set 的使用方法,今天咱们就趁热打铁,继续聊聊标准库中另一个非常重要的关联容器——map。 1.map类的介绍         首先,我们可以先看一下在源码中map的声明是什么: template < class Key, // map::key_type class T, // map::mapped_type class Compare = less<Key>, // map::key_compare class Alloc = allocator<pair<const Key,T> > // map::allocator_type > class

By Ne0inhk
【C++】多态

【C++】多态

多态 ✨前言:在 C++ 的世界里,“多态(polymorphism)” 是面向对象编程的灵魂之一。 它让同一个接口在不同对象上表现出不同的行为,从而大大提升了代码的复用性、扩展性与灵活性。 本文将带你深入理解多态的核心原理,从概念、实现条件、虚函数、重写规则,到虚函数表与动态绑定机制,逐步揭开多态背后的运行逻辑。 📖专栏:【C++成长之旅】 目录 * 多态 * 一、多态的概念 * 二、多态的定义及实现 * 2.1 多态的构成条件 * 2.1.1 实现多态还有两个必须重要条件: * 2.1.2 虚函数 * 2.1.3 虚函数的重写/覆盖 * 2.1.4 多态场景的⼀个选择题 * 2.1.

By Ne0inhk
【探寻C++之旅】C++ 智能指针完全指南:从原理到实战,彻底告别内存泄漏

【探寻C++之旅】C++ 智能指针完全指南:从原理到实战,彻底告别内存泄漏

前言 作为 C++ 开发者,你是否曾因以下场景头疼不已?函数中new了数组,却因异常抛出导致后续delete没执行,排查半天定位到内存泄漏;多模块共享一块内存,不知道该由谁负责释放,最后要么重复释放崩溃,要么漏释放泄漏;用了auto_ptr后,拷贝对象导致原对象 “悬空”,访问时直接崩溃却找不到原因。 如果你有过这些经历,那智能指针一定是你必须掌握的现代 C++ 工具。它基于 RAII 思想,自动管理动态资源,让你无需手动delete,从根源上减少内存泄漏风险。今天,我们就从 “为什么需要智能指针” 到 “不同智能指针的实战场景”,带你系统掌握这一核心特性。 请君浏览 * 前言 * 一、智能指针的诞生:解决手动管理内存的 “千古难题” * 1.1 一个典型的内存泄露场景 * 1.2 智能指针的核心:RAII 思想 * 二、C++ 标准库智能指针:

By Ne0inhk
【C++掌中宝】类和对象(二):隐藏的this指针

【C++掌中宝】类和对象(二):隐藏的this指针

文章目录 * 引言 * 1. 定义与用法 * 1.1 隐式存在的 this 指针 * 1.2 this 指针的用途与示例 * 2. 本质 * 3. 特点 * 4. this 指针的作用机制 * 5. 成员函数中的 this 指针 * 6. 空指针与 this 指针的特殊情况 * 7. 注意事项 * 8. 总结 * 结语 引言 在 C++ 编程中,类是面向对象编程的核心,而类中的成员函数与对象的交互则通过一个隐含的指针来实现,这就是 this 指针。它在 C++ 类的非静态成员函数中自动存在并指向调用该函数的对象实例。在这篇文章中,我们将详细探讨 this 指针的定义、用法以及其背后的工作原理。

By Ne0inhk