基于 YOLO12 的无人机航拍视角目标检测系统 | 极客日志

PythonAI算法

基于 YOLO12 的无人机航拍视角目标检测系统

介绍基于 YOLO12 和 YOLO11 的无人机航拍视角目标检测系统。涵盖环境配置、数据集准备（VisDrone）、模型训练、测试及图形化界面封装（PySide6/Gradio）。详细解析了 YOLO12 的区域注意力机制、R-ELAN 模块及 YOLO11 的 C2PSA、C3k2 等核心组件。实验结果表明系统在行人、车辆等小目标检测上具有较高精度和实时性，适用于智慧城市、农业及安防场景。

山野诗人发布于 2026/4/6更新于 2026/7/2247 浏览

基于 YOLO12 的无人机航拍视角目标检测系统

【大作业 -46】基于 yolo12 的航拍 (无人机) 视角目标检测与追踪系统

本次教程主要介绍基于无人机视角下的目标检测，对常规的行人、车辆等目标进行检测，并说明 YOLO12 的新模块。教程包含标注好的数据集、训练好的 YOLOv5、YOLOv8、YOLO11 以及 YOLO12 的模型，还有一个配套的图形化界面。

本次的数据集包含的类别如下：

0: pedestrian 行人 1: people 人 2: bicycle 自行车 3: car 汽车 4: van 货车 5: truck 卡车 6: tricycle 三轮车 7: awning-tricycle 遮阳篷三轮车 8: bus 公交车 9: motor 摩托车

以下是部分数据示例。

![image]

下面是部分实现效果，支持视频和图像检测。

![image]

项目实战

进行项目实战之前请务必安装好 pytorch 和 miniconda。

配置之前首先需要下载项目资源包，项目资源包请从官方渠道获取。

环境配置

环境配置请参考统一流程文档。

本地模型训练

模型训练使用的脚本为 step1_start_train.py，进行模型训练之前，请先按照配置好你本地的数据集。数据集在 ultralytics\cfg\datasets\A_my_data.yaml 目录下，你需要将数据集的根目录更换为你自己本地的目录。

更换之后修改训练脚本配置文件的路径，直接右键即可开始训练。

训练开始前如果出现报错，有很大的可能是数据集的路径没有配置正确，请检查数据集的路径，保证数据集配置没有问题。训练之后的结果将会保存在 runs 目录下。

GPU 服务器训练（可选）

目前可以使用 GPU 云来进行训练，新用户注册可获得代金券。

模型测试

模型的测试主要是对 map、p、r 等指标进行计算，使用的脚本为 step2_start_val.py，模型在训练的最后一轮已经执行了测试，其实这个步骤完全可以跳过，但是有的朋友可能想要单独验证，那你只需要更改测试脚本中的权重为你自己所训练的权重路径，即可单独进行测试。

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

#!/usr/bin/env python# -*- coding: UTF-8 -*-''' @Project : step3_start_window_track.py @File : web_demo.py @IDE : PyCharm @Author : System @Description : TODO 添加文件描述 @Date : 2024/12/11 20:25 '''import gradio as gr import PIL.Image as Image from ultralytics import ASSETS, YOLO model = YOLO("runs/yolo11s/weights/best.pt")# todo 需要在这个位置修改为你自己的模型地址 TITLE ="欢迎使用基于 YOLOv12 的无人机视角目标检测"defpredict_image(img, conf_threshold, iou_threshold):"""Predicts objects in an image using a YOLO11 model with adjustable confidence and IOU thresholds.""" results = model.predict( source=img, conf=conf_threshold, iou=iou_threshold, show_labels=True, show_conf=True, imgsz=640,)for r in results: im_array = r.plot() im = Image.fromarray(im_array[...,::-1])return im iface = gr.Interface( fn=predict_image, inputs=[ gr.Image(type="pil", label="Upload Image"), gr.Slider(minimum=0, maximum=1, value=0.25, label="Confidence threshold"), gr.Slider(minimum=0, maximum=1, value=0.45, label="IoU threshold"),], outputs=gr.Image(type="pil", label="Result"), title=TITLE, description="Upload images for inference.",# examples=[# [ASSETS / "bus.jpg", 0.25, 0.45],# [ASSETS / "zidane.jpg", 0.25, 0.45],# ],)
if __name__ =="__main__":# iface.launch(share=True)# iface.launch(share=True) iface.launch()

classA2C2f(nn.Module):
""" Area-Attention C2f module for enhanced feature extraction with area-based attention mechanisms. This module extends the C2f architecture by incorporating area-attention and ABlock layers for improved feature processing. It supports both area-attention and standard convolution modes. Attributes: cv1 (Conv): Initial 1x1 convolution layer that reduces input channels to hidden channels. cv2 (Conv): Final 1x1 convolution layer that processes concatenated features. gamma (nn.Parameter | None): Learnable parameter for residual scaling when using area attention. m (nn.ModuleList): List of either ABlock or C3k modules for feature processing. Methods: forward: Processes input through area-attention or standard convolution pathway. Examples: >>> m = A2C2f(512, 512, n=1, a2=True, area=1) >>> x = torch.randn(1, 512, 32, 32) >>> output = m(x) >>> print(output.shape) torch.Size([1, 512, 32, 32]) """
def__init__(self, c1, c2, n=1, a2=True, area=1, residual=False, mlp_ratio=2.0, e=0.5, g=1, shortcut=True):
""" Area-Attention C2f module for enhanced feature extraction with area-based attention mechanisms. Args: c1 (int): Number of input channels. c2 (int): Number of output channels. n (int): Number of ABlock or C3k modules to stack. a2 (bool): Whether to use area attention blocks. If False, uses C3k blocks instead. area (int): Number of areas the feature map is divided. residual (bool): Whether to use residual connections with learnable gamma parameter. mlp_ratio (float): Expansion ratio for MLP hidden dimension. e (float): Channel expansion ratio for hidden channels. g (int): Number of groups for grouped convolutions. shortcut (bool): Whether to use shortcut connections in C3k blocks. """
super().__init__() c_ =int(c2 * e)# hidden channelsassert c_ %32==0,"Dimension of ABlock be a multiple of 32." self.cv1 = Conv(c1, c_,1,1) self.cv2 = Conv((1+ n)* c_, c2,1) self.gamma = nn.Parameter(0.01* torch.ones(c2), requires_grad=True)if a2 and residual elseNone self.m = nn.ModuleList( nn.Sequential(*(ABlock(c_, c_ //32, mlp_ratio, area)for _ inrange(2)))if a2 else C3k(c_, c_,2, shortcut, g)for _ inrange(n))defforward(self, x):
"""Forward pass through R-ELAN layer.""" y =[self.cv1(x)] y.extend(m(y[-1])for m in self.m) y = self.cv2(torch.cat(y,1))if self.gamma isnotNone:return x + self.gamma.view(-1,len(self.gamma),1,1)* y return y

# Ultralytics YOLO 🚀, AGPL-3.0 license"""Model head modules."""import copy import math import torch import torch.nn as nn from torch.nn.init import constant_, xavier_uniform_ from ultralytics.utils.tal import TORCH_1_10, dist2bbox, dist2rbox, make_anchors from.block import DFL, BNContrastiveHead, ContrastiveHead, Proto from.conv import Conv, DWConv from.transformer import MLP, DeformableTransformerDecoder, DeformableTransformerDecoderLayer from.utils import bias_init_with_prob, linear_init __all__ ="Detect","Segment","Pose","Classify","OBB","RTDETRDecoder","v10Detect"

classConcat(nn.Module):
"""Concatenate a list of tensors along dimension."""
def__init__(self, dimension=1):
"""Concatenates a list of tensors along a specified dimension."""
super().__init__() self.d = dimension defforward(self, x):
"""Forward pass for the YOLOv8 mask Proto module."""return torch.cat(x, self.d)

torch.nn.functional.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None)

classC3k(C3):
"""C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""
def__init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
"""Initializes C3k module with specified channels, number of layers, and configurations."""
super().__init__(c1, c2, n, shortcut, g, e) c_ =int(c2 * e)# hidden channels# self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n))) self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0)for _ inrange(n)))

classC3k2(C2f):
"""Faster Implementation of CSP Bottleneck with 2 convolutions."""
def__init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
"""Initializes C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
super().__init__(c1, c2, n, shortcut, g, e) self.m = nn.ModuleList( C3k(self.c, self.c,2, shortcut, g)if c3k else Bottleneck(self.c, self.c, shortcut, g)for _ inrange(n))

classConv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation).""" default_act = nn.SiLU()# default activationdef__init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__() self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False) self.bn = nn.BatchNorm2d(c2) self.act = self.default_act if act isTrueelse act ifisinstance(act, nn.Module)else nn.Identity()defforward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""return self.act(self.bn(self.conv(x)))defforward_fuse(self, x):
"""Perform transposed convolution of 2D data."""return self.act(self.conv(x))

classC2PSA(nn.Module):
""" C2PSA module with attention mechanism for enhanced feature extraction and processing. This module implements a convolutional block with attention mechanisms to enhance feature extraction and processing capabilities. It includes a series of PSABlock modules for self-attention and feed-forward operations. Attributes: c (int): Number of hidden channels. cv1 (Conv): 1x1 convolution layer to reduce the number of input channels to 2*c. cv2 (Conv): 1x1 convolution layer to reduce the number of output channels to c. m (nn.Sequential): Sequential container of PSABlock modules for attention and feed-forward operations. Methods: forward: Performs a forward pass through the C2PSA module, applying attention and feed-forward operations. Notes: This module essentially is the same as PSA module, but refactored to allow stacking more PSABlock modules. Examples: >>> c2psa = C2PSA(c1=256, c2=256, n=3, e=0.5) >>> input_tensor = torch.randn(1, 256, 64, 64) >>> output_tensor = c2psa(input_tensor) """
def__init__(self, c1, c2, n=1, e=0.5):
"""Initializes C2PSA module with specified input/output channels, number of layers, and expansion ratio."""
super().__init__()assert c1 == c2 self.c =int(c1 * e) self.cv1 = Conv(c1,2* self.c,1,1) self.cv2 = Conv(2* self.c, c1,1) self.m = nn.Sequential(*(PSABlock(self.c, attn_ratio=0.5, num_heads=self.c //64)for _ inrange(n)))defforward(self, x):
"""Processes the input tensor 'x' through a series of PSA blocks and returns the transformed tensor.""" a, b = self.cv1(x).split((self.c, self.c), dim=1) b = self.m(b)return self.cv2(torch.cat((a, b),1))

path: H:/raspi/0000-38-visdrone-detect-yolo12/visdrone # dataset root dirtrain: VisDrone2019-DET-train/images # train images (relative to 'path') 6471 imagesval: VisDrone2019-DET-val/images # val images (relative to 'path') 548 imagestest: VisDrone2019-DET-test-dev/images # test images (optional) 1610 images# Classesnames:0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor

import torch import torch.nn as nn classGhostConv(nn.Module):
def__init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, ratio=2, dw_kernel_size=3):
""" Ghost Convolution 实现 Args: in_channels (int): 输入通道数 out_channels (int): 输出通道数 kernel_size (int): 卷积核大小 stride (int): 卷积步幅 padding (int): 卷积填充 ratio (int): 副特征与主特征的比例 dw_kernel_size (int): 深度卷积的卷积核大小 """
super(GhostConv, self).__init__() self.out_channels = out_channels self.primary_channels = out_channels // ratio # 主特征图通道数 self.ghost_channels = out_channels - self.primary_channels # 副特征图通道数# 主特征图的标准卷积 self.primary_conv = nn.Conv2d( in_channels, self.primary_channels, kernel_size, stride, padding, bias=False) self.bn1 = nn.BatchNorm2d(self.primary_channels)# 副特征图的深度卷积 self.ghost_conv = nn.Conv2d( self.primary_channels, self.ghost_channels, dw_kernel_size, stride=1, padding=dw_kernel_size //2, groups=self.primary_channels, bias=False) self.bn2 = nn.BatchNorm2d(self.ghost_channels) self.relu = nn.ReLU(inplace=True)defforward(self, x):
# 主特征图 primary_features = self.primary_conv(x) primary_features = self.bn1(primary_features)# 副特征图 ghost_features = self.ghost_conv(primary_features) ghost_features = self.bn2(ghost_features)# 合并主特征图和副特征图 output = torch.cat([primary_features, ghost_features], dim=1) output = self.relu(output)return output

import torch import torch.nn as nn classChannelAttention(nn.Module):
def__init__(self, in_channels, reduction=16):
""" 通道注意力模块 Args: in_channels (int): 输入通道数 reduction (int): 缩减比例因子 """
super(ChannelAttention, self).__init__() self.avg_pool = nn.AdaptiveAvgPool2d(1)# 全局平均池化 self.max_pool = nn.AdaptiveMaxPool2d(1)# 全局最大池化 self.fc = nn.Sequential( nn.Linear(in_channels, in_channels // reduction, bias=False), nn.ReLU(inplace=True), nn.Linear(in_channels // reduction, in_channels, bias=False)) self.sigmoid = nn.Sigmoid()defforward(self, x):
batch, channels, _, _ = x.size()# 全局平均池化 avg_out = self.fc(self.avg_pool(x).view(batch, channels))# 全局最大池化 max_out = self.fc(self.max_pool(x).view(batch, channels))# 加和后通过 Sigmoid out = avg_out + max_out out = self.sigmoid(out).view(batch, channels,1,1)# 通道加权return x * out classSpatialAttention(nn.Module):
def__init__(self, kernel_size=7):
""" 空间注意力模块 Args: kernel_size (int): 卷积核大小 """
super(SpatialAttention, self).__init__() self.conv = nn.Conv2d(2,1, kernel_size=kernel_size, padding=kernel_size //2, bias=False) self.sigmoid = nn.Sigmoid()defforward(self, x):
# 通道维度求平均和最大值 avg_out = torch.mean(x, dim=1, keepdim=True) max_out, _ = torch.max(x, dim=1, keepdim=True) combined = torch.cat([avg_out, max_out], dim=1)# 拼接# 卷积处理 out = self.conv(combined) out = self.sigmoid(out)# 空间加权return x * out classCBAM(nn.Module):
def__init__(self, in_channels, reduction=16, kernel_size=7):
""" CBAM 模块 Args: in_channels (int): 输入通道数 reduction (int): 缩减比例因子 kernel_size (int): 空间注意力卷积核大小 """
super(CBAM, self).__init__() self.channel_attention = ChannelAttention(in_channels, reduction) self.spatial_attention = SpatialAttention(kernel_size)defforward(self, x):
# 通道注意力模块 x = self.channel_attention(x)# 空间注意力模块 x = self.spatial_attention(x)return x

基于 YOLO12 的无人机航拍视角目标检测系统

基于 YOLO12 的无人机航拍视角目标检测系统

项目实战

环境配置

本地模型训练

GPU 服务器训练（可选）

模型测试

更多推荐文章

相关免费在线工具

图形化界面封装

文档

背景与意义

相关文献综述

本文算法介绍

yolo12 算法介绍

yolo11 算法介绍

实验结果分析

数据集介绍

指标结果分析

结论

参考文献

模型改进的基本流程（选看）

模型改进（选看）

更多推荐文章

相关免费在线工具

基于 YOLO12 的无人机航拍视角目标检测系统

基于 YOLO12 的无人机航拍视角目标检测系统

项目实战

环境配置

本地模型训练

GPU 服务器训练（可选）

模型测试

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

图形化界面封装

文档

背景与意义

相关文献综述

本文算法介绍

yolo12 算法介绍

yolo11 算法介绍

实验结果分析

数据集介绍

指标结果分析

结论

参考文献

模型改进的基本流程（选看）

模型改进（选看）

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具