PythonAI算法

Unsloth 框架下 LLaMA 3.1-8B 模型微调与环境搭建指南

综述由AI生成介绍基于 Unsloth 框架在 WSL 环境下对 LLaMA 3.1-8B 模型进行微调的完整流程。内容涵盖 Ubuntu 系统安装、CUDA 与 PyTorch 环境配置、Unsloth 库部署、模型加载与量化、数据集格式转换、LoRA 参数配置及训练推理测试。通过实例演示了从环境准备到模型生成的全过程，适合初学者快速上手大模型微调。

1739658202发布于 2025/2/7更新于 2026/6/317 浏览

Unsloth 微调实践：环境搭建与 LLaMA 3.1-8B 模型微调指南

本文将详细介绍如何使用 Unsloth 框架进行 LLaMA 3.1-8B 模型的微调，帮助您快速构建微调环境，并了解微调流程的基本步骤。本教程适合初学者，旨在帮助您在短时间内实现自己的专属模型微调。

Unsloth 环境搭建

最初，在 Windows 11 环境下安装并运行了 Unsloth，虽然安装过程顺利，但在模型微调过程中遇到了各种错误。尽管大部分问题都通过逐一解决了，但一些关键的 GPU 加速库仍然无法正常运行。这些库要求在 Linux 系统上才能正常运行并充分发挥其性能。

由于对 Linux 系统不够熟悉，尝试通过编译工具在 Windows 上重新编译这些库，但问题依然未能解决。最终，选择通过 Windows 上的 WSL（Windows Subsystem for Linux）安装 Ubuntu 的方式来解决这些兼容性问题。这样既避免了完全切换到 Linux 的麻烦，又能够使用 Linux 环境来进行模型微调。

第一步：在 WINDOWS 上通过 WSL 安装 UBUNTU

如果您使用的是 Linux 操作系统（建议使用 Ubuntu），可以跳过这一部分的内容，直接进入后续步骤。同时，确保您在 Linux 系统上安装了显卡驱动，以便正常使用 GPU 进行加速。

如果您在 Windows 上通过 WSL 安装 Ubuntu，由于 WSL 是一种虚拟化技术，您无需在 WSL 的 Ubuntu 系统中再次安装显卡驱动。只要 Windows 宿主机上已经正确安装并配置了显卡驱动，WSL 内的 Ubuntu 系统将自动使用这些驱动配置，支持 GPU 加速。

Windows 11 系统下，进入命令行工具，执行如下指令，即可快速安装完 Ubuntu：

wsl -install

从 Windows 中进入 Ubuntu 系统，同样需要打开命令行，执行如下指令：

wsl -d ubuntu

初次登录，会要求输入一个新的用户名、密码。

后续登录系统，会直接进入，而不必每次都输入用户名和密码。

第二步：升级系统相关组件

安装完 Ubuntu 系统后，需要对相关的组件进行升级：

apt-get update
apt-get install -y curl
apt-get install -y sudo
apt-get install -y gpg

第三步：安装 Anaconda

建议安装 Anaconda，安装相关的 Python 包会非常方便，同时也便于对 Python 环境进行管理。

wget https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh
sudo sh Anaconda3-2024.06-1-Linux-x86_64.sh

建议的安装目录：

/home/userName/anaconda3

安装完成之后需要手动将其加入到环境变量中。需要在 ~/.bashrc 的文件尾部增加如下内容：

export PATH="/home/wangjye/anaconda3/bin:$PATH"

第四步：安装 CUDA

这是最容易出错的过程，如果已经安装完驱动了，则需要在 Windows 宿主机上运行命令 nvidia-smi 来查看硬件支持的 CUDA 版本，不论是 Windows 还是 Linux 一定要注意查看，不能安装错了。

最大的坑是选择了不被支持的 CUDA 版本（如 CUDA 12.6），导致 PyTorch 及 TensorFlow 都无法兼容 4090 显卡。因此选择低于 12.6 版本的 CUDA。在这里安装的是稳定版本的 12.1。

wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
 sh cuda_12.1.0_530.30.02_linux.run

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

conda create --name unsloth_env python=3.10 pytorch-cuda=12.1 pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers -y

import torch
print(torch.__version__)
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))

2.4.0
True
1
NVIDIA GeForce RTX 4090

pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"

pip install ".[cu121-torch240]"

import json
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
     "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
     "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
     "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
     "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
     "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
     "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
     "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
     "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
     "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
     "unsloth/Phi-3-medium-4k-instruct",
     "unsloth/gemma-2-9b-bnb-4bit",
     "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth

# 载入的模型，如下目录文件是我下载之后存入本地文件系统中的文件。
model_path = "/mnt/d/02-LLM/LLM-APP/00-models/unsloth-llama-3.1-8b-bnb-4bit"

# 加载模型和分词器
model, tokenizer = FastLanguageModel.from_pretrained(
     model_name = model_path, # Choose any model from above list!
     max_seq_length = max_seq_length,
     dtype = dtype,
     load_in_4bit = load_in_4bit,
     # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

model_path = "/mnt/d/02-LLM/LLM-APP/00-models/unsloth-llama-3.1-8b-bnb-4bit"

[
   {
     "instruction": "请把现代汉语翻译成古文",
     "input": "世界及其所产生的一切现象，都是来源于物质。",
     "output": "天地与其所产焉，物也。"
   },
   {
     "instruction": "请把现代汉语翻译成古文",
     "input": "以概念来称谓事物而不超过事物的实际范围，只是概念的外延。",
     "output": "物以物其所物而不过焉，实也。"
   }
]

[
{"text":"Instruction: 请把现代汉语翻译成古文\nInput: 世界及其所产生的一切现象，都是来源于物质。\nOutput: 天地与其所产焉，物也。"},
{"text":"Instruction: 请把现代汉语翻译成古文\nInput: 世界及其所产生的一切现象，都是来源于物质。\nOutput: 天地与其所产焉，物也。"}
]

# 获取本地的数据集
local_dataset_path = "./data.json"  # 修改为你的数据集路径
# 载入的模型
model_path = "/mnt/d/02-LLM/LLM-APP/00-models/unsloth-llama-3.1-8b-bnb-4bit"
# 加载本地数据集
print('载入本地数据:start...')
with open(local_dataset_path, 'r', encoding='utf-8') as f:
    data = json.load(f)
# 将数据转换为 datasets.Dataset 对象
from datasets import Dataset
train_dataset = Dataset.from_list([
    {
        "text": f"Instruction: {item['instruction']}\nInput: {item['input']}\nOutput: {item['output']}"
    }
for item in data if isinstance(item, dict) and 'instruction' in item and 'input' in item and 'output' in item
])
print(f"处理后的训练数据集大小：{len(train_dataset)}")

载入本地数据:start...
处理后的训练数据集大小：457124

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

model = FastLanguageModel.get_peft_model(
     model,
     r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
     target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                       "gate_proj", "up_proj", "down_proj",],
     lora_alpha = 32,
     lora_dropout = 0, # Supports any, but = 0 is optimized
     bias = "none",    # Supports any, but = "none" is optimized
     # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
     use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
     random_state = 3407,
     use_rslora = False,  # We support rank stabilized LoRA
     loftq_config = None, # And LoftQ
)

trainer = SFTTrainer(
     model = model,
     tokenizer = tokenizer,
     train_dataset = train_dataset,
     dataset_text_field = "text",
     max_seq_length = max_seq_length,
     dataset_num_proc = 2,
     packing = False, # Can make training 5x faster for short sequences.
     args = TrainingArguments(
         per_device_train_batch_size = 4,
         gradient_accumulation_steps = 8,
         warmup_steps = 500,
         # num_train_epochs = 1, # Set this for 1 full training run.
         max_steps = 3000,
         learning_rate = 3e-5,
         fp16 = not is_bfloat16_supported(),
         bf16 = is_bfloat16_supported(),
         logging_steps = 100,
         optim = "adamw_8bit",
         weight_decay = 0.01,
         lr_scheduler_type = "linear",
         seed = 3407,
         output_dir = "outputs",
     ),
)

# 5. 训练
trainer_stats = trainer.train()

alpaca_prompt = """你的任务是将给定的现代汉语文本转换为符合古文。请注意保持原文的核心思想和情感，同时运用适当的古文词汇、语法结构和修辞手法，使转换后的文本读起来如同古代文人的笔触一般。
翻译要求：避免句子重复，确保语言通顺，符合古文表达习惯。
例如：将'以正确的概念来校正不正确的概念，又以不正确的概念的失误之处，反过来探究正确的概念之所以正确的所在。'翻译为'以其所正，正其所不正；以其所不正，疑其所正。'，确保语言通顺，符合古文表达习惯。
### Instruction:
{}
### Input:
{}
### output:
{}"""

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
     alpaca_prompt.format(
         "请把现代汉语翻译成古文", # instruction
         "这个管道昇在同时代人里也是极具个性和才华的。这时，赵孟頫在京城获得赏识，不再是那个只在吴兴有薄名，却不能靠书画养活自己，不得不去教私塾的教书先生。", # input
         "", # output - leave this blank for generation!
     )
], return_tensors = "pt").to("cuda")
# Generate the output
outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
# Decode the output
decoded_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)
#print(decoded_output)
# 提取输出结果中原有的 input 文本
# 假设输出格式是一致的，古文在"### output:\n"后面
# print(decoded_output[0])
original_text = decoded_output[0].split("### Input:\n")[1].split("### output:\n")[0].strip()
# Extract the translated ancient Chinese text
# Assuming the output format is consistent and the ancient text starts after "### output:\n"
translated_text = decoded_output[0].split("### output:\n")[1].strip()
print("原始的现代汉语：" + original_text)
print("翻译后的古文：" + translated_text)

原始的现代汉语：这个管道昇在同时代人里也是极具个性和才华的。这时，赵孟頫在京城获得赏识，不再是那个只在吴兴有薄名，却不能靠书画养活自己，不得不去教私塾的教书先生。
翻译后的古文：管仲之才亦异于当世，时赵孟頫在京得赏识，乃非吴兴薄名，不能自养，负笈私门者也。

Unsloth 框架下 LLaMA 3.1-8B 模型微调与环境搭建指南

Unsloth 微调实践：环境搭建与 LLaMA 3.1-8B 模型微调指南

Unsloth 环境搭建

第一步：在 WINDOWS 上通过 WSL 安装 UBUNTU

第二步：升级系统相关组件

第三步：安装 Anaconda

第四步：安装 CUDA

Unsloth 框架下 LLaMA 3.1-8B 模型微调与环境搭建指南

Unsloth 微调实践：环境搭建与 LLaMA 3.1-8B 模型微调指南

Unsloth 环境搭建

第一步：在 WINDOWS 上通过 WSL 安装 UBUNTU

第二步：升级系统相关组件

第三步：安装 Anaconda

第四步：安装 CUDA

更多推荐文章

相关免费在线工具

第五步：安装 PyTorch

第六步：安装 Unsloth

微调第一个 LLaMA 模型

第一步：下载待微调的模型

第二步：模型加载

第三步：数据处理

第四步：模型参数配置

第五步：模型训练

第六步：推理测试

总结

更多推荐文章

相关免费在线工具

Unsloth 框架下 LLaMA 3.1-8B 模型微调与环境搭建指南

Unsloth 微调实践：环境搭建与 LLaMA 3.1-8B 模型微调指南

Unsloth 环境搭建

第一步：在 WINDOWS 上通过 WSL 安装 UBUNTU

第二步：升级系统相关组件

第三步：安装 Anaconda

第四步：安装 CUDA

Unsloth 框架下 LLaMA 3.1-8B 模型微调与环境搭建指南

Unsloth 微调实践：环境搭建与 LLaMA 3.1-8B 模型微调指南

Unsloth 环境搭建

第一步：在 WINDOWS 上通过 WSL 安装 UBUNTU

第二步：升级系统相关组件

第三步：安装 Anaconda

第四步：安装 CUDA

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

第五步：安装 PyTorch

第六步：安装 Unsloth

微调第一个 LLaMA 模型

第一步：下载待微调的模型

第二步：模型加载

第三步：数据处理

第四步：模型参数配置

第五步：模型训练

第六步：推理测试

总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具