跳到主要内容PythonAI算法
LLaMA-Factory 大模型微调指南
基于 LLaMA-Factory 框架对 Qwen2.5 系列大模型进行微调的完整流程。涵盖环境安装、自定义数据集准备、全量微调、LoRA 微调及 QLoRA 微调的配置命令与参数说明。此外还包括模型权重合并方法及推理测试脚本示例,支持多 GPU 分布式训练及显存优化方案。
咸鱼开飞机30 浏览 一、LLAMA-Factory 简介
LLaMA-Factory是一个简单易用且高效的大模型训练框架,支持上百种大模型的训练,框架特性主要包括:
- 模型种类:LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen、Yi、Gemma、Baichuan、ChatGLM、Phi 等等。
- 训练算法:(增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等等。
- 运算精度:16 比特全参数微调、冻结微调、LoRA 微调和基于 AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ 的⅔/⅘/6/8 比特 QLoRA 微调。
- 优化算法:GaLore、BAdam、DoRA、LongLoRA、LLaMA Pro、Mixture-of-Depths、LoRA+、LoftQ 和 PiSSA。
- 加速算子:FlashAttention-2 和 Unsloth。
- 推理引擎:Transformers 和 vLLM。
- 实验面板:LlamaBoard、TensorBoard、Wandb、MLflow 等等。
本文将介绍如何使用 LLaMA-Factory 对 Qwen2.5 系列大模型进行微调(Qwen1.5 系列模型也适用),更多特性请参考 https://github.com/hiyouga/LlamaFactory
二、安装 LLaMA-Factory
LLaMA-Factory 的 github 地址为:https://github.com/hiyouga/LLaMA-Factory 。为防止项目更新带来软件版本不适配,我们下面安装一个历史版本。
- 在使用 AutoDL 克隆 git 仓库时,速度较慢,可以运行如下命令。
source /etc/network_turbo
cd /root/autodl-tmp
git clone --depth 1 https://github.com/Jiangnanjiezi/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e "[torch,metrics]" -i https://mirrors.aliyun.com/pypi/simple/
- 安装完成后,执行 llamafactory-cli version,若出现以下提示,则表明安装成功:
(Empty line removed)
三、准备训练数据
训练数据应保存为 json 文件,文件为:qwen_dataset.json。需要将其放到 autodl-tmp/LLaMA-Factory/data 下。
其内容示例如下:
[{"instruction": "请提取以下内容中的摘要信息","input": "保持身体健康的五个方法:\n\n1. 每天至少饮用 8 杯水,促进新陈代谢\n2. 每周进行 150 分钟中等强度运动,如快走或游泳\n3. 保证 7-9 小时高质量睡眠,避免熬夜\n4. 饮食中增加蔬菜水果比例,减少油炸食品\n5. 定期体检,监测血压、血糖等指标",
"output"
:
"多喝水、规律运动、充足睡眠、均衡饮食、定期体检"
}
,
{
"instruction"
:
"请提取以下内容中的摘要信息"
,
"input"
:
"提高学习效率的三个技巧:\n\n1. 使用番茄工作法,每 25 分钟专注后休息 5 分钟\n2. 建立思维导图整理知识框架\n3. 睡前复习重点内容加强记忆"
,
"output"
:
"番茄工作法、思维导图、睡前复习"
}
,
{
"instruction"
:
"请提取以下内容中的摘要信息"
,
"input"
:
"旅行必备物品清单:\n1. 护照/身份证原件及复印件\n2. 便携充电宝和转换插头\n3. 常用药品(退烧药、创可贴)\n4. 轻便折叠雨伞\n5. 分装洗漱用品"
,
"output"
:
"证件、充电设备、药品、雨具、洗漱包"
}
,
{
"instruction"
:
"请提取以下内容中的摘要信息"
,
"input"
:
"职场沟通四大原则:\n① 明确沟通目标\n② 使用金字塔表达结构\n③ 注意非语言信号(眼神/姿态)\n④ 及时确认信息理解度"
,
"output"
:
"目标明确、结构化表达、非语言交流、信息确认"
}
,
]
在 LLaMA-Factory 文件夹下的 data/dataset_info.json 文件中注册自定义的训练数据,在文件中添加如下配置信息:
"qwen_dataset": {"file_name": "qwen_dataset.json"},
四、模型训练
1. 模型下载
mkdir -p /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
cd /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
modelscope download --model Qwen/Qwen2.5-1.5B --local_dir ./
2. 全量微调
在 LLaMA-Factory 文件夹下,创建 qwen2.5-7b-full-sft.yaml 配置文件,用于设置全量参数训练的配置。
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
trust_remote_code: true
stage: sft
do_train: true
finetuning_type: full
deespeed: /root/autodl-tmp/LLaMA-Factory/examples/deepspeed/ds_z3_config.json
dataset: qwen_dataset
template: qwen
cutoff_len: 1024
overwrite_cache: true
preprocessing_num_workers: 16
output_dir: saves/qwen2.5-7b/full
logging_steps: 10
save_steps: 100
plot_loss: true
overwrite_output_dir: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 16
learning_rate: 1.0e-5
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
{
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"bf16": {
"enabled": "auto"
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"overlap_comm": false,
"contiguous_gradients": true,
"sub_group_size": 1e9,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_gather_16bit_weights_on_model_save": true
}
}
开始训练:
切换到 qwen2.5-7b-full-sft.yaml 所在的路径,执行下面的命令。
FORCE_TORCHRUN=1 llamafactory-cli train qwen2.5-7b-full-sft.yaml
[INFO|trainer.py:2519] 2025-11-15 00:36:01,373 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:36:01,373 >> Num examples = 54 [INFO|trainer.py:2521] 2025-11-15 00:36:01,373 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:36:01,373 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:36:01,373 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:36:01,373 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:36:01,373 >> Total optimization steps = 2 [INFO|trainer.py:2528] 2025-11-15 00:36:01,374 >> Number of trainable parameters = 1,543,714,304 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:24<00:00, 11.65s/it][INFO|trainer.py:4309] 2025-11-15 00:36:28,898 >> Saving model checkpoint to saves/qwen2.5-7b/full/checkpoint-2 [INFO|configuration_utils.py:491] 2025-11-15 00:36:28,901 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/config.json [INFO|configuration_utils.py:757] 2025-11-15 00:36:28,902 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/generation_config.json [INFO|modeling_utils.py:4181] 2025-11-15 00:36:33,246 >> Model weights saved in saves/qwen2.5-7b/full/checkpoint-2/model.safetensors [INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:33,247 >> chat template saved in saves/qwen2.5-7b/full/checkpoint-2/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:33,248 >> tokenizer config file saved in saves/qwen2.5-7b/full/checkpoint-2/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:33,248 >> Special tokens file saved in saves/qwen2.5-7b/full/checkpoint-2/special_tokens_map.json [2025-11-15 00:36:33,422][INFO][logging.py:107:log_dist][Rank 0][Torch] Checkpoint global_step2 is about to be saved! [2025-11-15 00:36:33,428][INFO][logging.py:107:log_dist][Rank 0] Saving model checkpoint: saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt [2025-11-15 00:36:33,428][INFO][torch_checkpoint_engine.py:21:save][Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt...[2025-11-15 00:36:33,438][INFO][torch_checkpoint_engine.py:23:save][Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt.[2025-11-15 00:36:33,439][INFO][torch_checkpoint_engine.py:21:save][Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...[2025-11-15 00:36:47,668][INFO][torch_checkpoint_engine.py:23:save][Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.[2025-11-15 00:36:47,669][INFO][engine.py:3701:_save_zero_checkpoint] zero checkpoint saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2025-11-15 00:36:47,673][INFO][torch_checkpoint_engine.py:33:commit][Torch] Checkpoint global_step2 is ready now! [INFO|trainer.py:2810] 2025-11-15 00:36:47,675 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 46.3009,'train_samples_per_second': 1.166,'train_steps_per_second': 0.043,'train_loss': 3.873927593231201,'epoch': 1.0} 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:46<00:00, 23.15s/it][INFO|trainer.py:4309] 2025-11-15 00:36:49,886 >> Saving model checkpoint to saves/qwen2.5-7b/full [INFO|configuration_utils.py:491] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/config.json [INFO|configuration_utils.py:757] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/generation_config.json [INFO|modeling_utils.py:4181] 2025-11-15 00:36:52,910 >> Model weights saved in saves/qwen2.5-7b/full/model.safetensors [INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:52,910 >> chat template saved in saves/qwen2.5-7b/full/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:52,911 >> tokenizer config file saved in saves/qwen2.5-7b/full/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:52,911 >> Special tokens file saved in saves/qwen2.5-7b/full/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 4GF train_loss = 3.8739 train_runtime = 0:00:46.30 train_samples_per_second = 1.166 train_steps_per_second = 0.043 [WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:36:53,090 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:36:53,090 >> Num examples = 7 [INFO|trainer.py:4648] 2025-11-15 00:36:53,090 >> Batch size = 1 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 9.30it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 3.5243 eval_runtime = 0:00:00.71 eval_samples_per_second = 9.774 eval_steps_per_second = 5.585 [INFO|modelcard.py:456] 2025-11-15 00:36:53,806 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}
3. lora 微调
在 LLaMA-Factory 文件夹下,创建 qwen2.5-7b-lora-sft.yaml 配置文件,用于设置 lora 微调的配置。
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
trust_remote_code: true
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
lora_rank: 16
lora_alpha: 16
lora_dropout: 0.05
dataset: alpaca_zh_demo
template: qwen
cutoff_len: 1024
overwrite_cache: true
preprocessing_num_workers: 16
output_dir: saves/qwen2.5-7b/lora/sft
logging_steps: 100
save_steps: 100
plot_loss: true
overwrite_output_dir: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 16
learning_rate: 1.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
llamafactory-cli train qwen2.5-7b-lora-sft.yaml
[INFO|trainer.py:2519] 2025-11-15 00:39:43,504 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:39:43,504 >> Num examples = 900 [INFO|trainer.py:2521] 2025-11-15 00:39:43,504 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:39:43,504 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:39:43,504 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:39:43,504 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:39:43,504 >> Total optimization steps = 29 [INFO|trainer.py:2528] 2025-11-15 00:39:43,507 >> Number of trainable parameters = 18,464,768 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 1.99s/it][INFO|trainer.py:4309] 2025-11-15 00:41:02,102 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft/checkpoint-29 [INFO|configuration_utils.py:763] 2025-11-15 00:41:02,121 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:41:02,122 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,220 >> chat template saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,220 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,221 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/special_tokens_map.json [INFO|trainer.py:2810] 2025-11-15 00:41:02,515 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 79.008,'train_samples_per_second': 11.391,'train_steps_per_second': 0.367,'train_loss': 1.657024120462352,'epoch': 1.0} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 2.72s/it][INFO|trainer.py:4309] 2025-11-15 00:41:02,518 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft [INFO|configuration_utils.py:763] 2025-11-15 00:41:02,537 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:41:02,538 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,611 >> chat template saved in saves/qwen2.5-7b/lora/sft/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,611 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,611 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 1054627GF train_loss = 1.657 train_runtime = 0:01:19.00 train_samples_per_second = 11.391 train_steps_per_second = 0.367 [WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:41:02,752 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:41:02,752 >> Num examples = 100 [INFO|trainer.py:4648] 2025-11-15 00:41:02,752 >> Batch size = 1 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.55it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 1.6728 eval_runtime = 0:00:01.62 eval_samples_per_second = 61.354 eval_steps_per_second = 30.677 [INFO|modelcard.py:456] 2025-11-15 00:41:04,381 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}
4. QLoRA 微调
在 LLaMA-Factory 文件夹下,创建 qwen2.5-7b-qlora-sft.yaml 配置文件,用于设置 qlora 微调的配置。
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
trust_remote_code: true
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
quantization_bit: 4
quantization_method: bitsandbytes
lora_rank: 16
lora_alpha: 16
lora_dropout: 0.05
dataset: alpaca_zh_demo
template: qwen
cutoff_len: 1024
overwrite_cache: true
preprocessing_num_workers: 16
output_dir: saves/qwen2.5-7b/qlora/sft
logging_steps: 100
save_steps: 100
plot_loss: true
overwrite_output_dir: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 16
learning_rate: 1.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
llamafactory-cli train qwen2.5-7b-qlora-sft.yaml
[INFO|trainer.py:2519] 2025-11-15 00:43:46,249 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:43:46,249 >> Num examples = 900 [INFO|trainer.py:2521] 2025-11-15 00:43:46,249 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:43:46,249 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:43:46,249 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:43:46,249 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:43:46,249 >> Total optimization steps = 29 [INFO|trainer.py:2528] 2025-11-15 00:43:46,254 >> Number of trainable parameters = 18,464,768 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 1.98s/it][INFO|trainer.py:4309] 2025-11-15 00:45:06,653 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft/checkpoint-29 [INFO|configuration_utils.py:763] 2025-11-15 00:45:06,673 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:45:06,674 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:06,761 >> chat template saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:06,761 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:06,761 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/special_tokens_map.json [INFO|trainer.py:2810] 2025-11-15 00:45:07,051 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 80.7972,'train_samples_per_second': 11.139,'train_steps_per_second': 0.359,'train_loss': 1.6571868370319236,'epoch': 1.0} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 2.78s/it][INFO|trainer.py:4309] 2025-11-15 00:45:07,054 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft [INFO|configuration_utils.py:763] 2025-11-15 00:45:07,073 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:45:07,074 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:07,136 >> chat template saved in saves/qwen2.5-7b/qlora/sft/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:07,136 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:07,137 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 1054627GF train_loss = 1.6572 train_runtime = 0:01:20.79 train_samples_per_second = 11.139 train_steps_per_second = 0.359 [WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:45:07,276 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:45:07,277 >> Num examples = 100 [INFO|trainer.py:4648] 2025-11-15 00:45:07,277 >> Batch size = 1 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.85it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 1.6738 eval_runtime = 0:00:01.61 eval_samples_per_second = 61.919 eval_steps_per_second = 30.96 [INFO|modelcard.py:456] 2025-11-15 00:45:08,890 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}
使用上述训练配置,各个方法实测的显存占用如下。训练中的显存占用与训练参数配置息息相关,可根据自身实际需求进行设置。
- 全量参数训练:42.18GB
- LoRA 训练:20.17GB
- QLoRA 训练:10.97GB
五、合并模型权重
1. 模型合并
如果采用 LoRA 或者 QLoRA 进行训练,脚本只保存对应的 LoRA 权重,需要合并权重才能进行推理。全量参数训练无需执行此步骤。下面将 LoRA 微调的权重和预训练模型进行合并。注意:如果是 QLoRA 微调的权重需要和使用 NF4 方式量化后的预训练模型进行合并。
llamafactory-cli export qwen2.5-7b-merge-lora.yaml
其中 qwen2.5-7b-merge-lora.yaml 中配置如下:
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
adapter_name_or_path: /root/autodl-tmp/LLaMA-Factory/saves/qwen2.5-7b/lora/sft
template: qwen
finetuning_type: lora
trust_remote_code: true
// 必须开启
export_dir: /root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged
export_size: 2
export_device: cpu
export_legacy_format: false
| 参数 | 说明 |
|---|
| model_name_or_path | 预训练模型的名称或路径 |
| template | 模型类型 |
| export_dir | 导出路径 |
| export_size | 最大导出模型文件大小 |
| export_device | 导出设备 |
| export_legacy_format | 是否使用旧格式导出 |
- 合并 Qwen2.5 模型权重,务必将 template 设为 qwen;无论 LoRA 还是 QLoRA 训练,合并权重时,finetuning_type 均为 lora。
- adapter_name_or_path 需要与微调中的适配器输出路径 output_dir 相对应。
2. 测试
import time
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged", trust_remote_code=True )
model = AutoModelForCausalLM.from_pretrained("/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged", device_map="auto", trust_remote_code=True ).eval()
prompt = "你好"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
start_time = time.time()
outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.3, top_p=0.4 )
end_time = time.time()
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("生成结果:", response)
num_generated_tokens = outputs.shape[1]- inputs['input_ids'].shape[1]
elapsed_time = end_time - start_time
tokens_per_second = num_generated_tokens / elapsed_time if elapsed_time > 0 else 0
print(f"生成了 {num_generated_tokens} 个 token,用时 {elapsed_time:.2f} 秒,速度约为 {tokens_per_second:.2f} token/s")
生成结果:你好,我有一个问题想问。您好,请问有什么问题需要帮助吗?我最近感到很焦虑,有什么方法可以缓解吗?焦虑是一种常见的心理问题,您可以尝试进行深呼吸、冥想、运动、与朋友聊天等方式来缓解焦虑。同时,也可以考虑寻求专业心理咨询师的帮助。生成了 64 个 token,用时 2.17 秒,速度约为 29.46 token/s
相关免费在线工具
- 加密/解密文本
使用加密算法(如AES、TripleDES、Rabbit或RC4)加密和解密文本明文。 在线工具,加密/解密文本在线工具,online
- RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。 在线工具,RSA密钥对生成器在线工具,online
- Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表,支持源码编辑与即时渲染。 在线工具,Mermaid 预览与可视化编辑在线工具,online
- 随机西班牙地址生成器
随机生成西班牙地址(支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选),支持数量快捷选择、显示全部与下载。 在线工具,随机西班牙地址生成器在线工具,online
- Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印,支持批量处理与下载。 在线工具,Gemini 图片去水印在线工具,online
- curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。 在线工具,curl 转代码在线工具,online