基于 YOLO12 的无人机航拍目标检测系统实战 | 极客日志

PythonAI算法

基于 YOLO12 的无人机航拍目标检测系统实战

本项目基于 YOLO12 算法构建无人机航拍视角目标检测系统，针对 VisDrone 数据集实现行人、车辆等目标的精准识别。内容涵盖环境配置、模型训练、指标评估及 PySide6 图形化界面封装。深入解析了 YOLO12 的区域注意力机制与 R-ELAN 模块，并提供 GhostConv 和 CBAM 两种模型改进方案以平衡速度与精度。系统已在实测中展现出对小目标和复杂背景的良好适应性，适用于智慧交通、农业监测及安防领域。

利刃发布于 2026/4/10更新于 2026/7/2435 浏览

基于 YOLO12 的无人机航拍目标检测系统

项目概述

本项目旨在构建一套基于无人机航拍视角的目标检测系统，主要利用 YOLO12 算法对行人、车辆等常见目标进行实时检测与追踪。相比传统方案，YOLO12 在保持实时推理速度的同时，通过引入注意力机制优化了特征提取能力，特别适合处理航拍图像中目标尺度小、背景复杂的特点。

本次实战涵盖了从环境配置、模型训练、指标评估到图形化界面封装的全流程。数据集采用 VisDrone，包含行人、自行车、汽车等多种类别，并提供了完整的代码资源及预训练模型（支持 YOLOv5/v8/v11/v12）。

数据类别

0: pedestrian 行人 
1: people 人 
2: bicycle 自行车 
3: car 汽车 
4: van 货车 
5: truck 卡车 
6: tricycle 三轮车 
7: awning-tricycle 遮阳篷三轮车 
8: bus 公交车 
9: motor 摩托车

train_batch0

环境准备

开始前请确保本地已安装 PyTorch 和 Miniconda。若未配置好 Python 环境，建议先查阅相关基础教程完成安装。

项目资源

下载项目资源包后，解压至本地目录。核心脚本包括：

step1_start_train.py：模型训练入口
step2_start_val.py：模型验证与测试
step3_start_window_track.py：图形化界面主程序
web_demo.py：Web 端演示接口

模型训练

数据集配置

训练前需修改配置文件路径。数据集根目录位于 ultralytics\cfg\datasets\A_my_data.yaml，请将 path 字段更新为你本地的实际路径。

path: H:/raspi/0000-38-visdrone-detect-yolo12/visdrone # dataset root dir
train: VisDrone2019-DET-train/images # train images (relative to 'path') 6471 images
val: VisDrone2019-DET-val/images # val images (relative to 'path') 548 images
test: VisDrone2019-DET-test-dev/images # test images (optional) 1610 images
# Classes
names:
0: pedestrian
1: people

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

model = YOLO("runs/yolo12s/weights/best.pt") # 替换为你的模型路径

import gradio as gr
from ultralytics import YOLO
import PIL.Image as Image

# 加载模型
model = YOLO("runs/yolo12s/weights/best.pt")

TITLE = "欢迎使用基于 YOLO12 的无人机视角目标检测"

def predict_image(img, conf_threshold, iou_threshold):
    """使用 YOLO12 模型预测图像中的物体"""
    results = model.predict(
        source=img,
        conf=conf_threshold,
        iou=iou_threshold,
        show_labels=True,
        show_conf=True,
        imgsz=640,
    )
    for r in results:
        im_array = r.plot()
        im = Image.fromarray(im_array[...,::-1])
    return im

iface = gr.Interface(
    fn=predict_image,
    inputs=[
        gr.Image(type="pil", label="Upload Image"),
        gr.Slider(minimum=0, maximum=1, value=0.25, label="Confidence threshold"),
        gr.Slider(minimum=0, maximum=1, value=0.45, label="IoU threshold"),
    ],
    outputs=gr.Image(type="pil", label="Result"),
    title=TITLE,
    description="Upload images for inference.",
)

if __name__ == "__main__":
    iface.launch()

class A2C2f(nn.Module):
    """ Area-Attention C2f module for enhanced feature extraction."""
    def __init__(self, c1, c2, n=1, a2=True, area=1, residual=False, mlp_ratio=2.0, e=0.5, g=1, shortcut=True):
        super().__init__()
        c_ = int(c2 * e)
        assert c_ % 32 == 0
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv((1 + n) * c_, c2, 1)
        self.gamma = nn.Parameter(0.01 * torch.ones(c2), requires_grad=True) if a2 and residual else None
        self.m = nn.ModuleList(
            nn.Sequential(*(ABlock(c_, c_ // 32, mlp_ratio, area) for _ in range(2))) if a2 else C3k(c_, c_, 2, shortcut, g)
            for _ in range(n)
        )

    def forward(self, x):
        y = [self.cv1(x)]
        y.extend(m(y[-1]) for m in self.m)
        y = self.cv2(torch.cat(y, 1))
        if self.gamma is not None:
            return x + self.gamma.view(-1, len(self.gamma), 1, 1) * y
        return y

class GhostConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, ratio=2, dw_kernel_size=3):
        super(GhostConv, self).__init__()
        self.out_channels = out_channels
        self.primary_channels = out_channels // ratio
        self.ghost_channels = out_channels - self.primary_channels
        self.primary_conv = nn.Conv2d(in_channels, self.primary_channels, kernel_size, stride, padding, bias=False)
        self.bn1 = nn.BatchNorm2d(self.primary_channels)
        self.ghost_conv = nn.Conv2d(self.primary_channels, self.ghost_channels, dw_kernel_size, stride=1, padding=dw_kernel_size // 2, groups=self.primary_channels, bias=False)
        self.bn2 = nn.BatchNorm2d(self.ghost_channels)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        primary_features = self.primary_conv(x)
        primary_features = self.bn1(primary_features)
        ghost_features = self.ghost_conv(primary_features)
        ghost_features = self.bn2(ghost_features)
        output = torch.cat([primary_features, ghost_features], dim=1)
        return self.relu(output)

class ChannelAttention(nn.Module):
    def __init__(self, in_channels, reduction=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(in_channels, in_channels // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(in_channels // reduction, in_channels, bias=False)
        )
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        batch, channels, _, _ = x.size()
        avg_out = self.fc(self.avg_pool(x).view(batch, channels))
        max_out = self.fc(self.max_pool(x).view(batch, channels))
        out = avg_out + max_out
        return x * self.sigmoid(out).view(batch, channels, 1, 1)

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        self.conv = nn.Conv2d(2, 1, kernel_size=kernel_size, padding=kernel_size // 2, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        combined = torch.cat([avg_out, max_out], dim=1)
        out = self.sigmoid(self.conv(combined))
        return x * out

class CBAM(nn.Module):
    def __init__(self, in_channels, reduction=16, kernel_size=7):
        super(CBAM, self).__init__()
        self.channel_attention = ChannelAttention(in_channels, reduction)
        self.spatial_attention = SpatialAttention(kernel_size)

    def forward(self, x):
        x = self.channel_attention(x)
        x = self.spatial_attention(x)
        return x

基于 YOLO12 的无人机航拍目标检测系统实战

基于 YOLO12 的无人机航拍目标检测系统

项目概述

数据类别

环境准备

项目资源

模型训练

数据集配置

更多推荐文章

相关免费在线工具

GPU 加速训练

模型评估

关键指标解读

图形化界面封装

桌面端 (PySide6)

Web 端 (Gradio)

算法原理深度解析

YOLO12 架构创新

核心模块

YOLO11 基础回顾

模型改进策略

速度优化：GhostConv

精度优化：CBAM

总结

更多推荐文章

相关免费在线工具

基于 YOLO12 的无人机航拍目标检测系统实战

基于 YOLO12 的无人机航拍目标检测系统

项目概述

数据类别

环境准备

项目资源

模型训练

数据集配置

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

GPU 加速训练

模型评估

关键指标解读

图形化界面封装

桌面端 (PySide6)

Web 端 (Gradio)

算法原理深度解析

YOLO12 架构创新

核心模块

YOLO11 基础回顾

模型改进策略

速度优化：GhostConv

精度优化：CBAM

总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具