基于 LLaMA-Factory 的大模型 LoRA 微调实战指南 | 极客日志

PythonAI算法

基于 LLaMA-Factory 的大模型 LoRA 微调实战指南

基于 LLaMA-Factory 平台，演示了使用 Qwen3-1.7B 基础模型结合 LoRA 技术进行指令微调的完整流程。涵盖环境部署（Docker 及源码）、数据集准备（Alpaca/ShareGPT 格式）、参数配置（学习率、秩、Epoch）、训练执行与效果评估（BLEU/ROUGE），以及模型合并导出与 Ollama 本地部署。重点解析了关键超参数对训练收敛的影响，并提供批量推理测试方案，适合希望快速上手大模型私有化微调的技术人员参考。

Elasticer发布于 2026/4/8更新于 2026/7/2129 浏览

LLaMA-Factory

Llama-Factory 是基于 transformers 库开发的训练、微调、推理一体化平台，支持预训练、指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等多种训练范式。它允许使用 Accelerate 或 DeepSpeed 作为训练加速后端。

使用 Llama-Factory 进行微调非常简单，其核心优势在于强大的数据处理与训练配置能力。只要按照官方文档配置好环境，直接运行对应的脚本即可。

安装部署

容器安装

推荐使用 Docker 快速构建环境，避免依赖冲突：

git clone https://github.com/hiyouga/LlamaFactory.git
cd LlamaFactory
cd docker/docker-cuda/
# Build the image
docker build -f ./docker/docker-cuda/Dockerfile \
  --build-arg PIP_INDEX=https://pypi.org/simple \
  --build-arg EXTRAS=metrics \
  -t llamafactory:latest
# Run the container
docker run -dit --ipc=host --gpus=all \
  -p 7860:7860 \
  -p 8000:8000 \
  --name llamafactory \
  llamafactory:latest
# Enter the container
docker exec -it llamafactory bash

编译安装

如果需要在宿主机直接安装：

cd workspace
git clone https://github.com/hiyouga/LlamaFactory.git
docker run -d --network=host --restart=always --name=llamafactory-dev \
  --gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
  -v "$PWD":/workspace -w /workspace \
  nvcr.io/nvidia/pytorch:25.08-py3 \
  tail -f /dev/null
docker exec -it -u root llamafactory-dev bash

# 创建配置目录
mkdir -p ~/.pip
# 创建配置文件
cat > ~/.pip/pip.conf <<EOF
[global]
index-url = https://pypi.tuna.tsinghua.edu.cn/simple
trusted-host = pypi.tuna.tsinghua.edu.cn
EOF

pip uninstall -y torch torchvision torchaudio nvidia-cublas nvidia-cudnn-cu12
pip install torch torchvision torchaudio -i https://pypi.tuna.tsinghua.edu.cn/simple --index-url https://download.pytorch.org/whl/cu130
pip install --upgrade nvidia-cublas nvidia-cudnn-cu13
cd LlamaFactory
pip install -e '.[torch,metrics]'

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

# 确认安装正常
llamafactory-cli train -h
# 确认 GPU 和 CUDA 环境正常
python
import torch
torch.cuda.current_device()
torch.cuda.get_device_name(0)
torch.__version__

pip install modelscope
modelscope download --model LLM-Research/Meta-Llama-3-8B-Instruct --local_dir /workspace/Meta-Llama-3-8B-Instruct
pip install -U bitsandbytes -i https://pypi.tuna.tsinghua.edu.cn/simple
vim test-inf.py

import torch
import warnings
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

warnings.filterwarnings('ignore', category=UserWarning, module='torch.cuda')
torch.cuda.set_device(0)
device = "cuda:0" if torch.cuda.is_available() else "cpu"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model_id = "/workspace/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    torch_dtype=torch.bfloat16,
    device_map=device,
    trust_remote_code=True,
    low_cpu_mem_usage=True
)

assert next(model.parameters()).device == torch.device(device), "模型加载失败！未使用 GPU！"
print(f"✅ 模型已 100% 加载到 GPU → {torch.cuda.get_device_name(0)}")

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(device)
terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|")]
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)
print("===== 🏴‍☠️ 海盗机器人回答 🏴‍☠️ =====")
print(response)

CUDA_VISIBLE_DEVICES=0 llamafactory-cli webchat \
  --model_name_or_path /workspace/Meta-Llama-3-8B-Instruct \
  --template llama3

llamafactory-cli train \
  --stage sft \
  --do_train True \
  --model_name_or_path Qwen/Qwen3-1.7B-Base \
  --preprocessing_num_workers 16 \
  --finetuning_type lora \
  --template qwen3 \
  --flash_attn auto \
  --dataset_dir data \
  --dataset huanhuan \
  --cutoff_len 1024 \
  --learning_rate 5e-05 \
  --num_train_epochs 4.0 \
  --max_samples 100000 \
  --per_device_train_batch_size 2 \
  --gradient_accumulation_steps 4 \
  --lr_scheduler_type cosine \
  --max_grad_norm 1.0 \
  --logging_steps 5 \
  --save_steps 100 \
  --warmup_steps 4 \
  --packing False \
  --enable_thinking True \
  --report_to none \
  --output_dir saves/Qwen3-1.7B-Base/lora/train_2026-01-02-06-40-31 \
  --bf16 True \
  --plot_loss True \
  --trust_remote_code True \
  --ddp_timeout 180000000 \
  --include_num_input_tokens_seen True \
  --optim adamw_torch \
  --adapter_name_or_path saves/Qwen3-1.7B-Base/lora/train_2026-01-02-06-01-20 \
  --lora_rank 8 \
  --lora_alpha 256 \
  --lora_dropout 0 \
  --lora_target all

pip install jieba rouge-chinese nltk

llamafactory-cli train \
  --stage sft \
  --model_name_or_path Qwen/Qwen3-1.7B-Base \
  --preprocessing_num_workers 16 \
  --finetuning_type lora \
  --quantization_method bnb \
  --template qwen3 \
  --flash_attn auto \
  --dataset_dir data \
  --eval_dataset huanhuan \
  --cutoff_len 1024 \
  --max_samples 100000 \
  --per_device_eval_batch_size 4 \
  --predict_with_generate True \
  --report_to none \
  --max_new_tokens 512 \
  --top_p 0.7 \
  --temperature 0.95 \
  --output_dir saves/Qwen3-1.7B-Base/lora/eval_2026-01-02-19-59-30 \
  --trust_remote_code True \
  --ddp_timeout 180000000 \
  --do_predict True \
  --adapter_name_or_path saves/Qwen3-1.7B-Base/lora/train_2026-01-02-06-40-31

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp/gguf-py
pip install --editable .
cd ../.. 
python convert_hf_to_gguf.py /workspace/LlamaFactory/output/Qwen3-1.7B-huanhuan/

curl -fsSL https://ollama.com/install.sh | sh
ollama serve
ollama create qwen3-huanhuan -f /workspace/LlamaFactory/output/Qwen3-1.7B-huanhuan/Modelfile
ollama run qwen3-huanhuan

基于 LLaMA-Factory 的大模型 LoRA 微调实战指南

LLaMA-Factory

安装部署

容器安装

编译安装

更多推荐文章

相关免费在线工具

测试验证

测试推理

WebUI 测试

准备数据集

执行微调

批量推理和训练效果评估

LoRA 模型合并导出

部署运行微调后的大模型

更多推荐文章

相关免费在线工具

基于 LLaMA-Factory 的大模型 LoRA 微调实战指南

LLaMA-Factory

安装部署

容器安装

编译安装

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

测试验证

测试推理

WebUI 测试

准备数据集

执行微调

批量推理和训练效果评估

LoRA 模型合并导出

部署运行微调后的大模型

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具