LLaMA Factory 从安装到推理的完整记录 | 极客日志

PythonAI算法

LLaMA Factory 从安装到推理的完整记录

使用 LLaMA Factory 微调大语言模型的实战笔记，涵盖 CUDA 安装、环境配置、数据集准备、命令行与 WebUI 训练、LoRA 合并与量化、推理引擎切换以及模型评估，记录了关键配置和常见问题。

微码行者发布于 2026/6/100 浏览

LLaMA Factory 是一个很方便的大模型微调平台，不需要写多少代码就能训练上百种模型。我最近用了一段时间，把从安装到推理的流程踩了一遍，这里记录一下。

环境准备

CUDA 安装

首先要确保 GPU 支持 CUDA，可以在 https://developer.nvidia.com/cuda-gpus 查一下。

检查 Linux 系统版本和 gcc：

uname -m && cat /etc/*release
# 输出类似 x86_64 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04

gcc --version
# gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

推荐装 CUDA 12.2，兼容性好一点。如果以前装过其他版本，先卸载干净：

# 如果 uninstaller 找不到，直接删目录也行
sudo rm -r /usr/local/cuda-12.1/
sudo apt clean && sudo apt autoclean

# 下载安装
wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run
sudo sh cuda_12.2.0_535.54.03_linux.run

注意：安装时先别急着选 Driver，尤其是驱动版本和显卡不兼容的话容易出问题。装完用 nvcc -V 验证一下。

LLaMA Factory 安装

基础环境：Ubuntu 22.04, CUDA 12.x, Python 3.10, PyTorch 2.x。

conda create -n llama_factory python=3.10 -y
conda activate llama_factory

# 安装 PyTorch（根据 CUDA 版本调整）
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip3 install torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 安装核心包
pip install llmtuner

# 克隆仓库
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

如果装依赖时冲突，可以试试 pip install --no-deps -e .，跳过依赖检查，或者自己维护一个纯净环境。

验证安装：

llamafactory-cli version

启动 Web 界面：

CUDA_VISIBLE_DEVICES=0 GRADIO_SHARE=1 GRADIO_SERVER_PORT=7860 llamafactory-cli webui

离线模型下载

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

git clone https://www.modelscope.cn/Qwen/Qwen2.5-0.5B-Instruct.git

pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl

[
  {
    "instruction": "计算这些物品的总费用。",
    "input": "汽车 - $3000，衣服 - $100，书 - $20。",
    "output": "总费用为 $3000 + $100 + $20 = $3120。"
  }
]

[
  {
    "instruction": "今天的天气怎么样？",
    "input": "",
    "output": "今天的天气不错，是晴天。",
    "history": [
      ["今天会下雨吗？", "今天不会下雨，是个好天气。"],
      ["今天适合出去玩吗？", "非常适合，空气质量很好。"]
    ]
  }
]

"数据集名称": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "system": "system",
    "history": "history"
  }
}

[
  {"text": "document1"},
  {"text": "document2"}
]

[
  {
    "instruction": "...",
    "input": "...",
    "chosen": "优质回答",
    "rejected": "劣质回答"
  }
]

[
  {
    "instruction": "...",
    "input": "...",
    "output": "...",
    "kto_tag": "true"
  }
]

[
  {
    "instruction": "描述这张图片",
    "output": "...",
    "images": ["path/to/img.jpg"]
  }
]

model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
output_dir: saves/llama3-8b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml \
  learning_rate=1e-5 \
  logging_steps=1

名称	描述
model_name_or_path	模型名称或路径
stage	训练阶段：rm, pt, sft, PPO, DPO, KTO, ORPO
do_train	true 训练，false 评估
finetuning_type	freeze, lora, full
lora_target	LoRA 目标模块，默认 all
dataset	逗号分隔多个数据集
template	数据集模板，必须与模型对应
output_dir	输出路径
logging_steps	日志步数间隔
save_steps	断点保存间隔
overwrite_output_dir	是否覆盖输出目录
per_device_train_batch_size	每卡 batch size
gradient_accumulation_steps	梯度累积步数
max_grad_norm	梯度裁剪阈值
learning_rate	学习率
lr_scheduler_type	学习率曲线：linear, cosine, polynomial, constant
num_train_epochs	训练轮数
bf16	是否使用 bf16
warmup_ratio	预热比例
warmup_steps	预热步数
push_to_hub	是否推送模型到 Huggingface

model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
template: llama3
finetuning_type: lora
export_dir: models/llama3_lora_sft
export_size: 2
export_device: cpu
export_legacy_format: false

llamafactory-cli export merge_config.yaml

model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
template: llama3
export_dir: models/llama3_gptq
export_quantization_bit: 4
export_quantization_dataset: data/c4_demo.json
export_size: 2
export_device: cpu
export_legacy_format: false

model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
template: llama3
infer_backend: huggingface  # 或 vllm

llamafactory-cli chat inference_config.yaml
# 或 Web 聊天
llamafactory-cli webchat inference_config.yaml

model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
template: llama3
finetuning_type: lora
infer_backend: huggingface

llamafactory-cli webchat examples/inference/llava1_5.yaml

python scripts/vllm_infer.py --model_name_or_path path_to_merged_model --dataset alpaca_en_demo

API_PORT=8000 CUDA_VISIBLE_DEVICES=0 llamafactory-cli api examples/inference/llama3_lora_sft.yaml

from openai import OpenAI
client = OpenAI(api_key="0", base_url="http://0.0.0.0:8000/v1")
messages = [{"role": "user", "content": "Who are you?"}]
result = client.chat.completions.create(messages=messages, model="meta-llama/Meta-Llama-3-8B-Instruct")
print(result.choices[0].message)

llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml

model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
finetuning_type: lora
task: mmlu_test
template: fewshot
lang: en
n_shot: 5
save_dir: saves/llama3-8b/lora/eval
batch_size: 4

model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
stage: sft
do_predict: true
finetuning_type: lora
eval_dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 50
overwrite_cache: true
preprocessing_num_workers: 16
output_dir: saves/llama3-8b/lora/predict
overwrite_output_dir: true
per_device_eval_batch_size: 1
predict_with_generate: true
ddp_timeout: 180000000

参数	说明
task	mmlu_test, ceval_validation, cmmlu_test
task_dir	评估数据集目录，默认 evaluation
batch_size	每卡 batch size，默认 4
seed	随机种子，默认 42
lang	评估语言：en, zh
n_shot	few-shot 样例数，默认 5
save_dir	结果保存路径
download_mode	数据集下载策略，默认复用已有

LLaMA Factory 从安装到推理的完整记录

环境准备

CUDA 安装

LLaMA Factory 安装

离线模型下载

更多推荐文章

相关免费在线工具

Windows 上的坑

数据该怎么准备

指令监督微调（SFT）

预训练

偏好数据集

KTO 数据集

多模态数据

用 YAML 启动训练

合并与量化

推理几种方式

原始模型推理

微调模型推理

多模态模型

批量推理

评估模型效果

更多推荐文章

相关免费在线工具

LLaMA Factory 从安装到推理的完整记录

环境准备

CUDA 安装

LLaMA Factory 安装

离线模型下载

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

Windows 上的坑

数据该怎么准备

指令监督微调（SFT）

预训练

偏好数据集

KTO 数据集

多模态数据

用 YAML 启动训练

合并与量化

推理几种方式

原始模型推理

微调模型推理

多模态模型

批量推理

评估模型效果

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具