YOLOv12 环境配置、训练与推理实战详解 | 极客日志

PythonAI算法

YOLOv12 环境配置、训练与推理实战详解

YOLOv12 环境搭建涉及 Conda 虚拟环境配置、PyTorch 与 CUDA 版本适配及依赖库安装。教程涵盖 VOC 数据集转 YOLO 格式、训练验证集划分策略及配置文件编写。提供推理与训练代码示例，解析模型参数设置与断点续训方法，助力开发者高效完成目标检测任务部署与模型优化。

DevOpsTeam发布于 2026/1/13更新于 2026/7/3154 浏览

YOLOv12 环境配置、训练与推理实战详解

YOLOv12 是近期发布的目标检测模型，由纽约州立大学联合中科院推出。相比前代版本，它在网络结构上进行了优化，引入了残差高效层聚合网络 (R-ELAN) 和区域注意力机制 (Area-Attention)，在保持高精度的同时提升了效率。

一、代码获取与模型结构

官方源码地址：https://github.com/sunsmarterjie/yolov12

1. 模型结构图

根据 yolov12.yaml 配置文件绘制的整体结构图显示，YOLOv12 对比 YOLOv11 减少了总层数，网络结构更加精简。核心改进在于 A2C2f 模块中引入的区域注意力机制，通过十字形窗口自我注意机制计算纵横交错的注意力，以较少的计算量获得更大的感受野。

YOLOv12 性能表现

理论细节可参考论文：https://arxiv.org/pdf/2502.12524

二、环境配置教程

YOLOv12 的环境搭建与 v11/v10 等版本通用，建议创建独立的虚拟环境以避免依赖冲突。如果遇到 ImportError: cannot import name 'scaled_dot_product_attention' 错误，通常意味着 PyTorch 版本过低，需重新配置环境。

1. 创建虚拟环境

推荐使用 Python 3.9 至 3.11 版本。这里以 3.11 为例：

conda create -n yolov12 python=3.11

输入 y 确认安装并等待下载完成。

2. 激活虚拟环境

conda activate yolov12

激活成功后，命令行左侧会显示 (yolov12) 标识。

3. 查询 CUDA 支持版本

无显卡用户可跳过此步。有显卡用户可通过终端输入 nvidia-smi 查看支持的最高 CUDA 版本。例如显示 CUDA 12.5，则安装 PyTorch 时可选择向下兼容的版本（如 cu121 或 cu118）。

若驱动过旧导致 CUDA 版本低，可前往 NVIDIA 官网更新驱动后重启电脑再次检查。

4. PyTorch 安装

根据官网推荐及兼容性，建议安装 PyTorch 2.2.2 版本。有 GPU 的用户选择带 CUDA 的版本，无 GPU 则选 CPU 版。

在线安装： 访问 PyTorch 官网复制对应命令，注意 -c 后面的内容无需复制。

离线安装： 若在线安装失败，可下载 .whl 文件进行本地安装。文件名包含关键信息：

cu118 / cu102：CUDA 版本
cp311 / cp39：Python 版本
win：操作系统

下载完成后，进入文件所在目录执行：

pip install torch-xxx.whl
pip install torchvision-xxx.whl
pip install torchaudio-xxx.whl

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

import torch
print(torch.__version__)
print(torch.cuda.is_available())
print(torch.cuda.device_count())

matplotlib>=3.3.0
numpy==1.24.4
opencv-python>=4.6.0
pillow>=7.1.2
pyyaml>=5.3.1
requests>=2.23.0
scipy>=1.4.1
tqdm>=4.64.0
pandas>=1.1.4
seaborn>=0.11.0
thop>=0.1.1
psutil

pip install -r requirements.txt

pip install huggingface-hub==0.23.2

pip install labelimg

# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
import os, cv2
import numpy as np

classes = []

def convert(size, box):
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = (box[0] + box[1]) / 2.0 - 1
    y = (box[2] + box[3]) / 2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)

def convert_annotation(xmlpath, xmlname):
    with open(xmlpath, "r", encoding='utf-8') as in_file:
        txtname = xmlname[:-4] + '.txt'
        txtfile = os.path.join(txtpath, txtname)
        tree = ET.parse(in_file)
        root = tree.getroot()
        filename = root.find('filename')
        img = cv2.imdecode(np.fromfile('{}/{}.{}'.format(imgpath, xmlname[:-4], postfix), np.uint8), cv2.IMREAD_COLOR)
        h, w = img.shape[:2]
        res = []
        for obj in root.iter('object'):
            cls = obj.find('name').text
            if cls not in classes:
                classes.append(cls)
            cls_id = classes.index(cls)
            xmlbox = obj.find('bndbox')
            b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
            bb = convert((w, h), b)
            res.append(str(cls_id) + " " + " ".join([str(a) for a in bb]))
        if len(res) != 0:
            with open(txtfile, 'w+') as f:
                f.write('\n'.join(res))

if __name__ == "__main__":
    postfix = 'png'
    imgpath = r'E:\A-毕业设计代做数据\helmet\test\images'
    xmlpath = r'E:\A-毕业设计代做数据\helmet\test\annotations'
    txtpath = r'E:\A-毕业设计代做数据\helmet\test\labels'
    if not os.path.exists(txtpath):
        os.makedirs(txtpath, exist_ok=True)
    listdir = os.listdir(xmlpath)
    error_file_list = []
    for i in range(0, len(listdir)):
        try:
            path = os.path.join(xmlpath, listdir[i])
            if ('.xml' in path) or ('.XML' in path):
                convert_annotation(path, listdir[i])
                print(f'file {listdir[i]} convert success.')
            else:
                print(f'file {listdir[i]} is not xml format.')
        except Exception as e:
            print(f'file {listdir[i]} convert error.')
            print(f'error message:\n{e}')
            error_file_list.append(listdir[i])
    print(f'this file convert failure\n{error_file_list}')
    print(f'Dataset Classes:{classes}')

# -*- coding: utf-8 -*-
import os, shutil
from sklearn.model_selection import train_test_split

val_size = 0.2
postfix = 'jpg'
imgpath = r'E:\A-毕业设计代做数据\datasets\images'
txtpath = r'E:\A-毕业设计代做数据\datasets\labels'
output_train_img_folder = r'E:\A-毕业设计代做数据\datasets\dataset_kengwa\images\train'
output_val_img_folder = r'E:\A-毕业设计代做数据\datasets\dataset_kengwa\images\val'
output_train_txt_folder = r'E:\A-毕业设计代做数据\datasets\dataset_kengwa\labels\train'
output_val_txt_folder = r'E:\A-毕业设计代做数据\datasets\dataset_kengwa\labels\val'

os.makedirs(output_train_img_folder, exist_ok=True)
os.makedirs(output_val_img_folder, exist_ok=True)
os.makedirs(output_train_txt_folder, exist_ok=True)
os.makedirs(output_val_txt_folder, exist_ok=True)

listdir = [i for i in os.listdir(txtpath) if 'txt' in i]
train, val = train_test_split(listdir, test_size=val_size, shuffle=True, random_state=0)

for i in train:
    img_source_path = os.path.join(imgpath, '{}.{}'.format(i[:-4], postfix))
    txt_source_path = os.path.join(txtpath, i)
    img_destination_path = os.path.join(output_train_img_folder, '{}.{}'.format(i[:-4], postfix))
    txt_destination_path = os.path.join(output_train_txt_folder, i)
    shutil.copy(img_source_path, img_destination_path)
    shutil.copy(txt_source_path, txt_destination_path)

for i in val:
    img_source_path = os.path.join(imgpath, '{}.{}'.format(i[:-4], postfix))
    txt_source_path = os.path.join(txtpath, i)
    img_destination_path = os.path.join(output_val_img_folder, '{}.{}'.format(i[:-4], postfix))
    txt_destination_path = os.path.join(output_val_txt_folder, i)
    shutil.copy(img_source_path, img_destination_path)
    shutil.copy(txt_source_path, txt_destination_path)

train: E:\Desktop\new-yolov9\yolotest\images\train # train images
val: E:\Desktop\new-yolov9\yolotest\images\val # val images
nc: 2 # class names
names: ['dog', 'cat']

# -*- coding: utf-8 -*-
from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO(model=r'D:\2-Python\1-YOLO\YOLOv11\ultralytics-8.3.2\yolo11n-seg.pt')
    model.predict(source=r'D:\2-Python\1-YOLO\YOLOv11\ultralytics-8.3.2\ultralytics\assets\bus.jpg', save=True, show=False,)

# -*- coding: utf-8 -*-
import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO(model=r'D:\2-Python\1-YOLO\YOLOv12\yolov12-main\ultralytics\cfg\models\v12\yolov12.yaml')
    # model.load('yolo11n.pt') # 如需加载预训练权重可开启
    model.train(data=r'data.yaml', imgsz=640, epochs=50, batch=4, workers=0, device='', optimizer='SGD', close_mosaic=10, resume=False, project='runs/train', name='exp', single_cls=False, cache=False,)

YOLOv12 环境配置、训练与推理实战详解

YOLOv12 环境配置、训练与推理实战详解

一、代码获取与模型结构

1. 模型结构图

二、环境配置教程

1. 创建虚拟环境

2. 激活虚拟环境

3. 查询 CUDA 支持版本

4. PyTorch 安装

更多推荐文章

相关免费在线工具

5. 验证 GPU 可用性

6. 安装其他依赖

7. Flash Attention 补充

三、数据集准备

1. 标注工具

2. VOC 格式转换

3. 数据集划分

4. 修改训练配置文件

四、YOLOv12 推理

1. 下载预训练模型

2. 编写推理脚本

五、YOLOv12 训练

六、断点续训

总结

更多推荐文章

相关免费在线工具

YOLOv12 环境配置、训练与推理实战详解

YOLOv12 环境配置、训练与推理实战详解

一、代码获取与模型结构

1. 模型结构图

二、环境配置教程

1. 创建虚拟环境

2. 激活虚拟环境

3. 查询 CUDA 支持版本

4. PyTorch 安装

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

5. 验证 GPU 可用性

6. 安装其他依赖

7. Flash Attention 补充

三、数据集准备

1. 标注工具

2. VOC 格式转换

3. 数据集划分

4. 修改训练配置文件

四、YOLOv12 推理

1. 下载预训练模型

2. 编写推理脚本

五、YOLOv12 训练

六、断点续训

总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具