基于 YOLOv12 的无人机航拍视角目标检测系统 | 极客日志

PythonAI算法

基于 YOLOv12 的无人机航拍视角目标检测系统

基于 YOLOv12 的无人机航拍视角目标检测系统整合了环境配置、模型训练、测试评估及图形化界面封装全流程。项目采用 VisDrone 数据集，覆盖行人、车辆等常见目标类别。深入解析了 YOLOv12 的区域注意力机制与 YOLOv11 网络架构，提供 GhostConv 和 CBAM 等轻量化改进方案，旨在解决航拍场景中小目标检测难题，实现高精度实时推理。

abccba发布于 2026/4/8更新于 2026/4/251 浏览

基于 YOLOv12 的无人机航拍视角目标检测系统

基于 YOLOv12 的无人机航拍视角目标检测系统

本项目旨在构建一个基于无人机（航拍）视角的目标检测与追踪系统，主要面向行人、车辆等常规目标的识别。教程涵盖了从环境配置、模型训练、测试评估到图形化界面封装的全流程，并包含标注好的数据集及训练好的 YOLOv5、YOLOv8、YOLOv11 及 YOLOv12 模型。

项目概览

本次数据集中的类别定义如下：

0: pedestrian 行人 
1: people 人 
2: bicycle 自行车 
3: car 汽车 
4: van 货车 
5: truck 卡车 
6: tricycle 三轮车 
7: awning-tricycle 遮阳篷三轮车 
8: bus 公交车 
9: motor 摩托车

以下是部分数据示例：

train_batch0

系统支持视频和图像检测，部分实现效果如下：

环境配置

进行项目实战前，请确保本地已安装 PyTorch 和 Miniconda。若未配置，请先完成基础 Python 环境搭建。

下载项目资源包后，解压至本地目录。环境依赖通常包含在 requirements.txt 中，建议使用虚拟环境管理依赖。

本地模型训练

模型训练脚本为 step1_start_train.py。训练前需确认本地数据集路径配置正确。数据集配置文件位于 ultralytics\cfg\datasets\A_my_data.yaml，请根据实际路径修改根目录。

修改配置文件中的路径后，直接运行脚本即可开始训练。若启动时报错，请优先检查数据集路径是否指向正确的文件夹。训练结果将保存在目录下。

runs

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import gradio as gr
import PIL.Image as Image
from ultralytics import YOLO, ASSETS

model = YOLO("runs/yolo11s/weights/best.pt")  # 需修改为你的模型地址
TITLE = "欢迎使用基于 YOLOv12 的无人机视角目标检测"

def predict_image(img, conf_threshold, iou_threshold):
    results = model.predict(
        source=img,
        conf=conf_threshold,
        iou=iou_threshold,
        show_labels=True,
        show_conf=True,
        imgsz=640,
    )
    for r in results:
        im_array = r.plot()
        im = Image.fromarray(im_array[...,::-1])
    return im

iface = gr.Interface(
    fn=predict_image,
    inputs=[
        gr.Image(type="pil", label="Upload Image"),
        gr.Slider(minimum=0, maximum=1, value=0.25, label="Confidence threshold"),
        gr.Slider(minimum=0, maximum=1, value=0.45, label="IoU threshold"),
    ],
    outputs=gr.Image(type="pil", label="Result"),
    title=TITLE,
    description="Upload images for inference.",
)

if __name__ == "__main__":
    iface.launch()

class A2C2f(nn.Module):
    """ Area-Attention C2f module for enhanced feature extraction."""
    def __init__(self, c1, c2, n=1, a2=True, area=1, residual=False, mlp_ratio=2.0, e=0.5, g=1, shortcut=True):
        super().__init__()
        c_ = int(c2 * e)
        assert c_ % 32 == 0
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv((1 + n) * c_, c2, 1)
        self.gamma = nn.Parameter(0.01 * torch.ones(c2), requires_grad=True) if a2 and residual else None
        self.m = nn.ModuleList(
            nn.Sequential(*(ABlock(c_, c_ // 32, mlp_ratio, area) for _ in range(2))) if a2 else C3k(c_, c_, 2, shortcut, g)
            for _ in range(n)
        )

    def forward(self, x):
        y = [self.cv1(x)]
        y.extend(m(y[-1]) for m in self.m)
        y = self.cv2(torch.cat(y, 1))
        if self.gamma is not None:
            return x + self.gamma.view(-1, len(self.gamma), 1, 1) * y
        return y

class Concat(nn.Module):
    def __init__(self, dimension=1):
        super().__init__()
        self.d = dimension
    def forward(self, x):
        return torch.cat(x, self.d)

class C3k2(C2f):
    def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
        super().__init__(c1, c2, n, shortcut, g, e)
        self.m = nn.ModuleList(
            C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck(self.c, self.c, shortcut, g)
            for _ in range(n)
        )

class Conv(nn.Module):
    default_act = nn.SiLU()
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act
    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

class C2PSA(nn.Module):
    def __init__(self, c1, c2, n=1, e=0.5):
        super().__init__()
        assert c1 == c2
        self.c = int(c1 * e)
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv(2 * self.c, c1, 1)
        self.m = nn.Sequential(*(PSABlock(self.c, attn_ratio=0.5, num_heads=self.c // 64) for _ in range(n)))
    def forward(self, x):
        a, b = self.cv1(x).split((self.c, self.c), dim=1)
        b = self.m(b)
        return self.cv2(torch.cat((a, b), 1))

path: H:/raspi/0000-38-visdrone-detect-yolo12/visdrone
train: VisDrone2019-DET-train/images
val: VisDrone2019-DET-val/images
test: VisDrone2019-DET-test-dev/images
names:
0: pedestrian
1: people
2: bicycle
3: car
4: van
5: truck
6: tricycle
7: awning-tricycle
8: bus
9: motor

class GhostConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, ratio=2, dw_kernel_size=3):
        super(GhostConv, self).__init__()
        self.out_channels = out_channels
        self.primary_channels = out_channels // ratio
        self.ghost_channels = out_channels - self.primary_channels
        self.primary_conv = nn.Conv2d(in_channels, self.primary_channels, kernel_size, stride, padding, bias=False)
        self.bn1 = nn.BatchNorm2d(self.primary_channels)
        self.ghost_conv = nn.Conv2d(self.primary_channels, self.ghost_channels, dw_kernel_size, stride=1, padding=dw_kernel_size // 2, groups=self.primary_channels, bias=False)
        self.bn2 = nn.BatchNorm2d(self.ghost_channels)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        primary_features = self.relu(self.bn1(self.primary_conv(x)))
        ghost_features = self.relu(self.bn2(self.ghost_conv(primary_features)))
        output = torch.cat([primary_features, ghost_features], dim=1)
        return output

class ChannelAttention(nn.Module):
    def __init__(self, in_channels, reduction=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(in_channels, in_channels // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(in_channels // reduction, in_channels, bias=False)
        )
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc(self.avg_pool(x).view(x.size(0), -1))
        max_out = self.fc(self.max_pool(x).view(x.size(0), -1))
        out = self.sigmoid(avg_out + max_out).view(x.size(0), -1, 1, 1)
        return x * out

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        self.conv = nn.Conv2d(2, 1, kernel_size=kernel_size, padding=kernel_size // 2, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        combined = torch.cat([avg_out, max_out], dim=1)
        out = self.sigmoid(self.conv(combined))
        return x * out

class CBAM(nn.Module):
    def __init__(self, in_channels, reduction=16, kernel_size=7):
        super(CBAM, self).__init__()
        self.channel_attention = ChannelAttention(in_channels, reduction)
        self.spatial_attention = SpatialAttention(kernel_size)

    def forward(self, x):
        x = self.channel_attention(x)
        x = self.spatial_attention(x)
        return x