LLaMA Factory 大模型微调指南 | 极客日志

PythonAI算法

LLaMA Factory 大模型微调指南

LLaMA Factory 是一个高效的大型语言模型训练与微调平台，支持多种模型架构、训练算法及量化技术。文章涵盖环境部署、数据集构建、SFT 训练、LoRA 合并、推理及评估等全流程操作。通过命令行或 WebUI 即可完成从零开始的大模型微调实践，适用于本地化部署与性能优化场景。

极光发布于 2026/4/8更新于 2026/7/735 浏览

一、LLaMA-Factory 简介

LLaMA Factory 是一个简单易用且高效的大型语言模型（Large Language Model）训练与微调平台。通过 LLaMA Factory，可以在无需编写任何代码的前提下，在本地完成上百种预训练模型的微调，框架特性包括：

模型种类：LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen、Yi、Gemma、Baichuan、ChatGLM、Phi 等等。
训练算法：（增量）预训练、（多模态）指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等等。
运算精度：16 比特全参数微调、冻结微调、LoRA 微调和基于 AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ 的 2/3/4/5/6/8 比特 QLoRA 微调。
优化算法：GaLore、BAdam、DoRA、LongLoRA、LLaMA Pro、Mixture-of-Depths、LoRA+、LoftQ 和 PiSSA。
加速算子：FlashAttention-2 和 Unsloth。
推理引擎：Transformers 和 vLLM。
实验监控：LlamaBoard、TensorBoard、Wandb、MLflow、SwanLab 等等。

二、安装部署

1、CUDA 安装

CUDA 是由 NVIDIA 创建的一个并行计算平台和编程模型，它让开发者可以使用 NVIDIA 的 GPU 进行高性能的并行计算。

首先，在 https://developer.nvidia.com/cuda-gpus/ 查看您的 GPU 是否支持 CUDA。

保证当前 Linux 版本支持 CUDA。在命令行中输入 uname -m && cat /etc/*release，应当看到类似的输出：

# 命令 uname -m && cat /etc/*release
# 输出结果：x86_64 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04

检查是否安装了 gcc。在命令行中输入 gcc --version，应当看到类似的输出：

# 输入命令 gcc --version
# 查看结果 gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

在以下网址下载所需的 CUDA，这里推荐 12.2 版本。https://developer.nvidia.com/cuda-gpus 注意需要根据上述输出选择正确版本。

如果您之前安装过 CUDA（例如为 12.1 版本），需要先使用 sudo /usr/local/cuda-12.1/bin/cuda-uninstaller 卸载。如果该命令无法运行，可以直接：

# 卸载 CUDA
sudo rm -r /usr/local/cuda-12.1/
sudo apt clean && sudo apt autoclean
# 卸载完成后运行以下命令并根据提示继续安装
wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run
sudo sh cuda_12.2.0_535.54.03_linux.run

: 在确定 CUDA 自带驱动版本与 GPU 是否兼容之前，建议取消 Driver 的安装。

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

# 创建 python 环境
conda create -n llama_factory python=3.10 -y
# 创建成功后切换到新的环境
conda activate llama_factory
# 安装 cuda pytorch 等核心工具
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip3 install torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install llmtuner
# 安装 llama_factory
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
# 安装环境依赖（官网）
pip install -e ".[torch,metrics]"

CUDA_VISIBLE_DEVICES=0 GRADIO_SHARE=1 GRADIO_SERVER_PORT=7860 llamafactory-cli webui

# 从魔搭社区下载
git clone https://www.modelscope.cn/Qwen/Qwen2.5-0.5B-Instruct.git

pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl

名称	描述
torch	开源深度学习框架 PyTorch，广泛用于机器学习和人工智能研究中。
torch-npu	PyTorch 的昇腾设备兼容包。
metrics	用于评估和监控机器学习模型性能。
deepspeed	提供了分布式训练所需的零冗余优化器。
bitsandbytes	用于大型语言模型量化。
hqq	用于大型语言模型量化。
eetq	用于大型语言模型量化。
gptq	用于加载 GPTQ 量化模型。
awq	用于加载 AWQ 量化模型。
aqlm	用于加载 AQLM 量化模型。
vllm	提供了高速并发的模型推理服务。
galore	提供了高效全参微调算法。
badam	提供了高效全参微调算法。
qwen	提供了加载 Qwen v1 模型所需的包。
modelscope	魔搭社区，提供了预训练模型和数据集的下载途径。
swanlab	开源训练跟踪工具 SwanLab，用于记录与可视化训练过程
dev	用于 LLaMA Factory 开发维护。

{
  "instruction": "计算这些物品的总费用。",
  "input": "输入：汽车 - $3000，衣服 - $100，书 - $20。",
  "output": "汽车、衣服和书的总费用为 $3000 + $100 + $20 = $3120。"
}

[
  {
    "instruction": "人类指令（必填）",
    "input": "人类输入（选填）",
    "output": "模型回答（必填）",
    "system": "系统提示词（选填）",
    "history": [
      ["第一轮指令（选填）", "第一轮回答（选填）"],
      ["第二轮指令（选填）", "第二轮回答（选填）"]
    ]
  }
]

[
  {
    "instruction": "今天的天气怎么样？",
    "input": "",
    "output": "今天的天气不错，是晴天。",
    "history": [
      ["今天会下雨吗？", "今天不会下雨，是个好天气。"],
      ["今天适合出去玩吗？", "非常适合，空气质量很好。"]
    ]
  }
]

"数据集名称": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "system": "system",
    "history": "history"
  }
}

[
  {"text": "document"},
  {"text": "document"}
]

"数据集名称": {
  "file_name": "data.json",
  "columns": {
    "prompt": "text"
  }
}

[
  {
    "instruction": "人类指令（必填）",
    "input": "人类输入（选填）",
    "chosen": "优质回答（必填）",
    "rejected": "劣质回答（必填）"
  }
]

[
  {
    "instruction": "人类指令（必填）",
    "input": "人类输入（选填）",
    "output": "模型回答（必填）",
    "kto_tag": "人类反馈 [true/false]（必填）"
  }
]

"数据集名称": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "kto_tag": "kto_tag"
  }
}

[
  {
    "instruction": "人类指令（必填）",
    "input": "人类输入（选填）",
    "output": "模型回答（必填）",
    "images": ["图像路径（必填）"]
  }
]

[
  {
    "instruction": "人类指令（必填）",
    "input": "人类输入（选填）",
    "output": "模型回答（必填）",
    "videos": ["视频路径（必填）"]
  }
]

"数据集名称": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "videos": "videos"
  }
}

[
  {
    "instruction": "人类指令（必填）",
    "input": "人类输入（选填）",
    "output": "模型回答（必填）",
    "audios": ["音频路径（必填）"]
  }
]

"数据集名称": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "audios": "audios"
  }
}

llamafactory-cli webui

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml \
  learning_rate=1e-5 \
  logging_steps=1

### examples/train_lora/llama3_lora_sft.yaml
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
output_dir: saves/llama3-8b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

名称	描述
model_name_or_path	模型名称或路径
stage	训练阶段，可选：rm(reward modeling), pt(pretrain), sft(Supervised Fine-Tuning), PPO, DPO, KTO, ORPO
do_train	true 用于训练，false 用于评估
finetuning_type	微调方式。可选：freeze, lora, full
lora_target	采取 LoRA 方法的目标模块，默认值为 `all`。
dataset	使用的数据集，使用","分隔多个数据集
template	数据集模板，请保证数据集模板与模型相对应。
output_dir	输出路径
logging_steps	日志输出步数间隔
save_steps	模型断点保存间隔
overwrite_output_dir	是否允许覆盖输出目录
per_device_train_batch_size	每个设备上训练的批次大小
gradient_accumulation_steps	梯度积累步数
max_grad_norm	梯度裁剪阈值
learning_rate	学习率
lr_scheduler_type	学习率曲线，可选 `linear`, `cosine`, `polynomial`, `constant` 等。
num_train_epochs	训练周期数
bf16	是否使用 bf16 格式
warmup_ratio	学习率预热比例
warmup_steps	学习率预热步数
push_to_hub	是否推送模型到 Huggingface

### examples/merge_lora/llama3_lora_sft.yaml
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
template: llama3
finetuning_type: lora
### export
export_dir: models/llama3_lora_sft
export_size: 2
export_device: cpu
export_legacy_format: false

### examples/merge_lora/llama3_gptq.yaml
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
template: llama3
### export
export_dir: models/llama3_gptq
export_quantization_bit: 4
export_quantization_dataset: data/c4_demo.json
export_size: 2
export_device: cpu
export_legacy_format: false

### examples/merge_lora/llama3_q_lora.yaml
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
template: llama3
finetuning_type: lora
### export
export_dir: models/llama3_lora_sft
export_size: 2
export_device: cpu
export_legacy_format: false

### examples/inference/llama3.yaml
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
template: llama3
infer_backend: huggingface #choices：[huggingface, vllm]

### examples/inference/llama3_lora_sft.yaml
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
template: llama3
finetuning_type: lora
infer_backend: huggingface #choices：[huggingface, vllm]

llamafactory-cli webchat examples/inference/llava1_5.yaml

model_name_or_path: llava-hf/llava-1.5-7b-hf
template: vicuna
infer_backend: huggingface #choices：[huggingface, vllm]

python scripts/vllm_infer.py --model_name_or_path path_to_merged_model --dataset alpaca_en_demo

# api_call_example.py
from openai import OpenAI
client = OpenAI(api_key="0", base_url="http://0.0.0.0:8000/v1")
messages = [{"role": "user", "content": "Who are you?"}]
result = client.chat.completions.create(messages=messages, model="meta-llama/Meta-Llama-3-8B-Instruct")
print(result.choices[0].message)

### examples/train_lora/llama3_lora_eval.yaml
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
# 可选项
### method
finetuning_type: lora
### dataset
task: mmlu_test # mmlu_test, ceval_validation, cmmlu_test
template: fewshot
lang: en
n_shot: 5
### output
save_dir: saves/llama3-8b/lora/eval
### eval
batch_size: 4

### examples/extras/nlg_eval/llama3_lora_predict.yaml
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
### method
stage: sft
do_predict: true
finetuning_type: lora
### dataset
eval_dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 50
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: saves/llama3-8b/lora/predict
overwrite_output_dir: true
### eval
per_device_eval_batch_size: 1
predict_with_generate: true
ddp_timeout: 180000000

参数名称	类型	介绍
task	str	评估任务的名称，可选项有 mmlu_test, ceval_validation, cmmlu_test
task_dir	str	包含评估数据集的文件夹路径，默认值为 evaluation。
batch_size	int	每个 GPU 使用的批量大小，默认值为 4。
seed	int	用于数据加载器的随机种子，默认值为 42。
lang	str	评估使用的语言，可选值为 en、zh。默认值为 en。
n_shot	int	few-shot 的示例数量，默认值为 5。
save_dir	str	保存评估结果的路径，默认值为 None。如果该路径已经存在则会抛出错误。
download_mode	str	评估数据集的下载模式，默认值为 DownloadMode.REUSE_DATASET_IF_EXISTS。如果数据集已经存在则重复使用，否则则下载。

LLaMA Factory 大模型微调指南

一、LLaMA-Factory 简介

二、安装部署

1、CUDA 安装

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

2、LLaMA-Factory 安装

LLaMA-Factory 校验

下载离线模型，一会进行训练时候填写到模型路径即可

LLaMA-Factory 高级选项

Windows

QLoRA

FlashAttention-2

Extra Dependency

三、数据微调

指令监督微调数据集

预训练数据集

偏好数据集

KTO 数据集

多模态数据集

图像数据集

视频数据集

音频数据集

1、数据集的建立

2、数据集格式

3、模型参数

4、开始运行

5、导出模型

四、webui

评估预测与对话

导出

五、SFT 训练

命令行

六、LoRA 合并

合并

量化

七、推理

原始模型推理配置

微调模型推理配置

多模态模型

批量推理

八、评估

通用能力评估

NLG 评估

评估相关参数

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具