YOLOv13 算法模型解析：网络结构、损失函数与训练推理

YOLOv13 算法模型解析

YOLO（You Only Look Once）作为一种高效、实时的目标检测算法，一直是计算机视觉领域中最受欢迎的技术之一。YOLOv13 是 YOLO 系列中一款高效、精准且灵活的目标检测模型。目标检测任务中提供了更高的准确度、更强的实时性和更丰富的功能，是目标检测领域的强力工具。YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception（基于超图增强的自适应视觉感知的实时目标检测）是 YOLO 系列的最新版本，旨在提升实时目标检测的性能，尤其在复杂场景中进行目标检测时表现更加优越。

YOLO 系列模型由于其卓越的精度和计算效率，实时目标检测中处于领先地位。然而，YOLO11 及早期版本的卷积架构以及 YOLOv12 引入的基于区域的自注意力机制都仅限于局部信息聚合和成对相关建模，缺乏捕捉全局多对多高阶相关性的能力，这限制了其在复杂场景中的检测性能。为了解决上述问题作者提出了 YOLOv13，一种精确且轻量级的目标检测器。首先，提出一种基于超图的自适应相关增强（HyperACE）机制，该机制能自适应地利用潜在高阶相关性，并克服了以往方法仅限于基于超图计算的成对相关建模的局限，实现了高效的全局跨位置和跨尺度特征融合与增强；随后，提出一种基于 HyperACE 的全流程聚合与分布（Full-PAD）范式，通过将增强相关特征分布到整个网络，能够有效实现全网络的细粒度信息流动和表示协同；最后，提出使用深度可分离卷积替代普通的大卷积核卷积，并设计一系列模块，在不牺牲性能的情况下显著减少参数量和计算复杂度。实验结果表明，YOLOv13 在参数更少和 FLOPs 更低的情况下实现了最先进的性能。

参考论文：[2506.17733] YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception

参考代码：GitHub - iMoonLab/yolov13: Implementation of "YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception".

YOLOv13 具有以下优势：

**高精度与低延迟：**通过引入高效的超图计算和信息分配机制，YOLOv13 在复杂场景中取得更高的检测精度，同时保持低延迟和高推理速度。
**轻量化与高效计算：**得益于深度可分卷积的使用，YOLOv13 显著降低了模型的参数量和计算量，在保持高精度的同时，提升了处理速度和模型的实时性。
**全局关联建模：**通过 HyperACE 机制，YOLOv13 能够捕捉更复杂的多物体间的空间和语义关联，这对于检测多个物体或遮挡物体尤其重要。

YOLOv13 网络结构配置文件如下：

nc: 80 # number of classes scales: # model compound scaling constants, i.e. 'model=yolov13n.yaml' will call yolov13.yaml with scale 'n' # [depth, width, max_channels]
n: [0.50, 0.25, 1024] # Nano
s: [0.50, 0.50, 1024] # Small
l: [1.00, 1.00, 512] # Large
x: [1.00, 1.50, 512] # Extra Large
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2, 1, 2]] # 1-P2/4
  - [-1, 2, DSC3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2, 1, 4]] # 3-P3/8
  - [-1, 2, DSC3k2, [512, False, 0.25]]
  - [-1, 1, DSConv, [512, 3, 2]] # 5-P4/16
  - [-1, 4, A2C2f, [512, True, 4]]
  - [-1, 1, DSConv, [1024, 3, 2]] # 7-P5/32
  - [-1, 4, A2C2f, [1024, True, 1]] # 8
head:
  - [[4, 6, 8], 2, HyperACE, [512, 8, True, True, 0.5, 1, "both"]]
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [ 9, 1, DownsampleConv, []]
  - [[6, 9], 1, FullPAD_Tunnel, []] #12
  - [[4, 10], 1, FullPAD_Tunnel, []] #13
  - [[8, 11], 1, FullPAD_Tunnel, []] #14
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 12], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, DSC3k2, [512, True]] # 17
  - [[-1, 9], 1, FullPAD_Tunnel, []] #18
  - [17, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 13], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, DSC3k2, [256, True]] # 21
  - [10, 1, Conv, [256, 1, 1]]
  - [[21, 22], 1, FullPAD_Tunnel, []] #23
  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 18], 1, Concat, [1]] # cat head P4
  - [-1, 2, DSC3k2, [512, True]] # 26
  - [[-1, 9], 1, FullPAD_Tunnel, []]
  - [26, 1, Conv, [512, 3, 2]]
  - [[-1, 14], 1, Concat, [1]] # cat head P5
  - [-1, 2, DSC3k2, [1024,True]] # 30 (P5/32-large)
  - [[-1, 11], 1, FullPAD_Tunnel, []]
  - [[23, 27, 31], 1, Detect, [nc]] # Detect(P3, P4, P5)

YOLOv13 核心创新

Hypergraph-based Adaptive Correlation Enhancement (HyperACE)

YOLOv13 通过引入HyperACE 机制，使用超图来建模视觉特征的高阶关联。传统的 YOLO 模型主要依赖局部特征的关联建模，而 HyperACE 能够捕捉多对多的高阶全局关联，尤其在复杂场景中表现出色。

与传统的图或自注意力机制相比，HyperACE 通过自适应的超边计算提高特征关联建模的灵活性和准确性。

Full-Pipeline Aggregation-and-Distribution (FullPAD)

为了优化信息流动，YOLOv13 引入FullPAD 范式，通过全管道特征聚合和分配来提高网络各层之间的协同效应。增强的特征能够在网络的不同阶段（如 backbone、neck、detection head）之间流动，从而提高梯度传播和检测性能。

轻量化设计：深度可分卷积（DSConv）：

YOLOv13 使用深度可分卷积来替代传统的大卷积核卷积操作，显著减少模型的参数和计算复杂度。这样设计不仅保持了模型的检测能力，还加快推理速度，使其在性能和效率之间取得了更好的平衡。

高效的超图计算

YOLOv13 的超图计算与传统方法不同，它能够自适应地生成超边，并动态估算每个像素的参与度。这使得 YOLOv13 更加灵活，能够根据图像的不同特点和需求动态调整计算方式，提高模型对复杂场景的适应能力。

YOLOv13 网络结构分析

YOLOv13 引入A2C2f、DSConv和改进的 DSC3k2来处理主干不同阶段的特征提取。为了在保证准确率的前提下降低计算开销，YOLOv13 引入了深度可分卷积 (DSConv），替代了传统的大卷积核卷积操作。

YOLOv13 网络主要包含 Backbone、Neck 和 Head 3 个部分。

Backbone 采用 DSC3k2、A2C2f 和 DSConv 模块，提升特征提取能力。Backbone 部分负责特征提取，采用了一系列卷积和反卷积层，同时使用了残差连接和瓶颈结构来减少网络的大小并提高性能
Neck 颈部网络位于主干网络和头部网络之间（Neck 中的核心模块包含：DSC3k2、HyperACE、FullPAD_Tunnel、nn.Upsample、Concat），它的作用是进行特征融合和增强。通过引入基于超图的自适应关联增强（HyperACE）机制和高效的超图计算提高模型对于复杂需求及场景的适应能力。
Head 头部网络是目标检测模型的决策部分，负责产生最终的检测结果。

文章配图

图 1 YOLOv13 整体网络结构图

A2C2f 模块

A2C2f 模块（Area-Attention Enhanced Cross-Feature module）是 YOLOv12 中提出的，通过引入基于注意力机制的设计，成功突破了传统 CNN 在速度与精度之间的权衡。该模块的结构图如下图所示，其主要功能包括：

**特征提取：**结合卷积层和多层感知机（MLP），有效提取输入特征并增强模型的表达能力。
**注意力机制：**通过区域注意力（Area-Attention）模块，提升特征提取的效率，减少计算复杂度，同时保持较大的感受野。
**残差连接：**可选的残差连接用于稳定训练过程，增强特征的表达能力，确保模型在训练时的稳定性和收敛性。

文章配图

class A2C2f(nn.Module):
    def __init__(self, c1, c2, n=1, a2=True, area=1, residual=False, mlp_ratio=2.0, e=0.5, g=1, shortcut=True):
        super().__init__()
        c_ = int(c2 * e) # hidden channels
        assert c_ % 32 == 0, "Dimension of ABlock be a multiple of 32." 
        # num_heads = c_ // 64 if c_ // 64 >= 2 else c_ // 32
        num_heads = c_ // 32
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv((1 + n) * c_, c2, 1) # optional act=FReLU(c2)
        init_values = 0.01 # or smaller
        self.gamma = nn.Parameter(init_values * torch.ones((c2)), requires_grad=True) if a2 and residual else None
        self.m = nn.ModuleList(
            nn.Sequential(*(ABlock(c_, num_heads, mlp_ratio, area) for _ in range(2))) if a2 else C3k(c_, c_, 2, shortcut, g) for _ in range(n)
        )
    def forward(self, x):
        """Forward pass through R-ELAN layer."""
        y = [self.cv1(x)]
        y.extend(m(y[-1]) for m in self.m)
        if self.gamma is not None:
            return x + self.gamma.view(1, -1, 1, 1) * self.cv2(torch.cat(y, 1))
        return self.cv2(torch.cat(y, 1))

DSConv 模块

DSConv 深度可分卷积通过将卷积操作分解为深度卷积和逐点卷积，显著减少了模型的参数量和计算量。相较于传统的卷积操作，深度可分卷积不仅保持了良好的特征提取能力，同时也提高了效率。这种设计使得 YOLOv13 在运行时可以实现更高的推理速度，并且能够在低延迟和低计算复杂度下完成高精度的目标检测。

文章配图

DSConv 代码

class DSConv(_ConvNd): #https://arxiv.org/pdf/1901.01928v1.pdf
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=None, dilation=1, groups=1, padding_mode='zeros', bias=False, block_size=32, KDSBias=False, CDS=False):
        padding = _pair(autopad(kernel_size, padding, dilation))
        kernel_size = _pair(kernel_size)
        stride = _pair(stride)
        dilation = _pair(dilation)
        blck_numb = math.ceil(((in_channels)/(block_size*groups)))
        super(DSConv, self).__init__(
            in_channels, out_channels, kernel_size, stride, padding, dilation, False, _pair(0), groups, bias, padding_mode)
        # KDS weight From Paper
        self.intweight = torch.Tensor(out_channels, in_channels, *kernel_size)
        self.alpha = torch.Tensor(out_channels, blck_numb, *kernel_size)
        # KDS bias From Paper
        self.KDSBias = KDSBias
        self.CDS = CDS
        if KDSBias:
            self.KDSb = torch.Tensor(out_channels, blck_numb, *kernel_size)
        if CDS:
            self.CDSw = torch.Tensor(out_channels)
            self.CDSb = torch.Tensor(out_channels)
        self.reset_parameters()
    def get_weight_res(self):
        # Include expansion of alpha and multiplication with weights to include in the convolution layer here
        alpha_res = torch.zeros(self.weight.shape).to(self.alpha.device)
        # Include KDSBias if self.KDSBias:
        KDSBias_res = torch.zeros(self.weight.shape).to(self.alpha.device)
        # Handy definitions: nmb_blocks = self.alpha.shape[1]
        total_depth = self.weight.shape[1]
        bs = total_depth//nmb_blocks
        llb = total_depth-(nmb_blocks-1)*bs
        # Casting the Alpha values as same tensor shape as weight
        for i in range(nmb_blocks):
            length_blk = llb if i==nmb_blocks-1 else bs
            shp = self.alpha.shape # Notice this is the same shape for the bias as well to_repeat=self.alpha[:, i, ...].view(shp[0],1,shp[2],shp[3]).clone()
            repeated = to_repeat.expand(shp[0], length_blk, shp[2], shp[3]).clone()
            alpha_res[:, i*bs:(i*bs+length_blk), ...] = repeated.clone()
        if self.KDSBias:
            to_repeat = self.KDSb[:, i, ...].view(shp[0], 1, shp[2], shp[3]).clone()
            repeated = to_repeat.expand(shp[0], length_blk, shp[2], shp[3]).clone()
            KDSBias_res[:, i*bs:(i*bs+length_blk), ...] = repeated.clone()
        if self.CDS:
            to_repeat = self.CDSw.view(-1, 1, 1, 1)
            repeated = to_repeat.expand_as(self.weight)
            print(repeated.shape) # Element-wise multiplication of alpha and weight
            weight_res = torch.mul(alpha_res, self.weight)
        if self.KDSBias:
            weight_res = torch.add(weight_res, KDSBias_res)
        return weight_res
    def forward(self, input):
        # Get resulting weight #weight_res = self.get_weight_res() # Returning convolution
        return F.conv2d(input, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)

class DSConv2D(Conv):
    def __init__(self, inc, ouc, k=1, s=1, p=None, g=1, d=1, act=True):
        super().__init__(inc, ouc, k, s, p, g, d, act)
        self.conv = DSConv(inc, ouc, k, s, p, g, d)

DS-Ck2 模块

YOLOv13 通过轻量化设计，采用深度可分卷积优化 C3k2 模块，DSC3k2 模块与 YOLOv11 的 C3k2 模块相比，使用更少的参数来改进特征表示。DS-C3k2 模块源自 C3k2 结构，在继承 C3k2 的基础上，结合深度可分卷积的轻量化特性，进一步减少了模型的参数量和计算量。在 YOLOv13 模型中，DS-C3k2 广泛应用于骨干网络和颈部网络，作为基本的特征提取模块，有助于提高模型整体的检测效率和性能。DS-C3k2 模块结构如上图所示。

DS-C3k2 模块实现代码

class DSC3k2(C2f):
    """ An improved C3k2 module that uses lightweight depthwise separable convolution blocks.改进的 C3k2 模块，使用轻量级深度可分卷积块。
    This class redesigns C3k2 module, replacing its internal processing blocks with either DSBottleneck or DSC3k modules.重新设计 C3k2 模块，将内部处理块替换为 DSBottleneck 或 DSC3k 模块。
    Attributes:
        c1 (int): Number of input channels.输入通道数
        c2 (int): Number of output channels.输出通道数
        n (int, optional): Number of internal processing blocks to stack. Defaults to 1.要堆叠的内部处理块数量，默认 1
        dsc3k (bool, optional): If True, use DSC3k as the internal block. If False, use DSBottleneck. Defaults to False.是否使用 DSC3k，False 使用 DSBottleneck
        e (float, optional): Expansion ratio for the C2f module's hidden channels. Defaults to 0.5.扩展因子，决定隐藏通道数
        g (int, optional): Number of groups for grouped convolution (passed to parent C2f). Defaults to 1.分组卷积参数
        shortcut (bool, optional): Whether to use shortcut connections in the internal blocks. Defaults to True.是否使用残差卷积
        k1 (int, optional): Kernel size for the first DSConv in internal blocks. Defaults to 3.第一个 DSConv 的卷积核大小
        k2 (int, optional): Kernel size for the second DSConv in internal blocks. Defaults to 7.第二个 DSConv 的卷积核大小
        d2 (int, optional): Dilation for the second DSConv in internal blocks. Defaults to 1.第二个 DSConv 的膨胀率
    Methods:
        forward: Performs a forward pass through the DSC3k2 module (inherited from C2f).
    Examples:
        >>> import torch
        >>> # Using DSBottleneck as internal block
        >>> model1 = DSC3k2(c1=64, c2=64, n=2, dsc3k=False)
        >>> x = torch.randn(2, 64, 128, 128)
        >>> output1 = model1(x)
        >>> print(f"With DSBottleneck: {output1.shape}")
        With DSBottleneck: torch.Size([2, 64, 128, 128])
        >>> # Using DSC3k as internal block
        >>> model2 = DSC3k2(c1=64, c2=64, n=1, dsc3k=True)
        >>> output2 = model2(x)
        >>> print(f"With DSC3k: {output2.shape}")
        With DSC3k: torch.Size([2, 64, 128, 128])
    """
    def __init__(
        self,
        c1,
        c2,
        n=1,
        dsc3k=False,
        e=0.5,
        g=1,
        shortcut=True,
        k1=3,
        k2=7,
        d2=1
    ):
        super().__init__(c1, c2, n, shortcut, g, e)
        if dsc3k:
            self.m = nn.ModuleList(
                DSC3k(
                    self.c, self.c, n=2, shortcut=shortcut, g=g, e=1.0, k1=k1, k2=k2, d2=d2
                ) for _ in range(n)
            )
        else:
            self.m = nn.ModuleList(
                DSBottleneck(
                    self.c, self.c, shortcut=shortcut, e=1.0, k1=k1, k2=k2, d2=d2
                ) for _ in range(n)
            )

基于超图的自适应关联增强（HyperACE）机制

YOLOv13 的核心创新是HyperACE 机制，它通过超图替代了传统的关联建模，能够有效捕捉特征间潜在的高阶关联。与早期版本依赖局部信息或简单自注意力机制不同，HyperACE 能够学习多对多的高阶关联，通过跨空间位置和尺度捕捉全局语义信息，特别适用于复杂场景中的物体交互和紧密关系。

文章配图

自适应超图计算

自适应超图计算与传统的固定阈值超图不同，YOLOv13 中的自适应超图计算动态学习每个像素在超边中的参与程度，使得高阶关联建模更加灵活、准确，并能在不同尺度和位置之间有效建模视觉特征的关系。

文章配图

**自适应超边的可视化呈现。**第一第二列中的超边主要聚焦于前景物体间的高阶交互关系，第三列则着重展现背景与前景目标之间的互动模式。这些自适应超边的可视化结果，能够直观呈现 YOLOv13 模型所建模的高阶视觉关联特征。

HyperACE 代码

class HyperACE(nn.Module):
    def __init__(self, c1, c2, n=1, num_hyperedges=8, dsc3k=True, shortcut=False, e1=0.5, e2=1, context="both", channel_adjust=True):
        super().__init__()
        self.c = int(c2 * e1)
        self.cv1 = Conv(c1, 3 * self.c, 1, 1)
        self.cv2 = Conv((4 + n) * self.c, c2, 1)
        self.m = nn.ModuleList(
            DSC3k(self.c, self.c, 2, shortcut, k1=3, k2=7) if dsc3k else DSBottleneck(self.c, self.c, shortcut=shortcut) for _ in range(n)
        )
        self.fuse = FuseModule(c1, channel_adjust)
        self.branch1 = C3AH(self.c, self.c, e2, num_hyperedges, context)
        self.branch2 = C3AH(self.c, self.c, e2, num_hyperedges, context)
    def forward(self, X):
        x = self.fuse(X)
        y = list(self.cv1(x).chunk(3, 1))
        out1 = self.branch1(y[1])
        out2 = self.branch2(y[1])
        y.extend(m(y[-1]) for m in self.m)
        y[1] = out1
        y.append(out2)
        return self.cv2(torch.cat(y, 1))

全管道聚合与分配（FullPAD）范式

FullPAD 范式旨在提高整个网络的信息流动，确保特征在 backbone、neck 和 head 之间的细粒度协同。通过在全管道中分配增强的特征，它有效改善了梯度传播，从而提高了检测精度。

文章配图

FullPAD 代码

class FullPAD_Tunnel(nn.Module):
    def __init__(self):
        super().__init__()
        self.gate = nn.Parameter(torch.tensor(0.0))
    def forward(self, x):
        out = x[0] + self.gate * x[1]
        return out

损失函数

目标检测任务

YOLOv13 目标检测任务的损失沿用 YOLOv8 的损失，主要由分类损失和矩形框回归损失（Ciou loss、DFL loss）组成。

# Cls loss
loss[1] = self.varifocal_loss(pred_scores, target_scores, target_labels)/target_scores_sum # VFL way

# Bbox loss
if fg_mask.sum():
    target_bboxes /= stride_tensor
    loss[0], loss[2] = self.bbox_loss(
        pred_distri,pred_bboxes,anchor_points,target_bboxestarget_scores,target_scores_sum,fg_mask
    )

分类损失

YOLOv13 用分类损失使用 VFL Loss,VFL 提出了非对称的加权操作，FL 和 QFL 都是对称的。将框的位置建模成一个 general distribution，让网络快速的聚焦于和目标位置距离近的位置的分布。DFL 能够让网络更快地聚焦于目标 y 附近的值，增大它们的概率；DFL 的含义是以交叉熵的形式去优化与标签 y 最接近的一左一右 2 个位置的概率，从而让网络更快的聚焦到目标位置的邻近区域的分布；也就是说学出来的分布理论上是在真实浮点坐标的附近，并且以线性插值的模式得到距离左右整数坐标的权重。

文章配图

q 是 label，正样本时候 q 为 bbox 和 gt 的 IoU，负样本时候 q=0，当为正样本时候其实没有采用 FL，而是普通的 BCE，只不过多了一个自适应 IoU 加权，用于突出主样本。而为负样本时候就是标准的 FL 了。可以明显发现 VFL 比 QFL 更加简单，主要特点是正负样本非对称加权、突出正样本为主样本。

矩形框回归损失

在 YOLOv13 中，完全 IoU（CIoU）被应用于衡量预测边界框与实际边界框之间的差异；DFL 损失是鼓励网络预测的分布靠近真实值的分布，被用于衡量边界框回归的分布差异。

CIoU 是一个综合性指标，它考虑了边界框之间的三个属性：重叠比、中心点之间的距离、长宽比，公式如下：

文章配图

其中 p 和 g 表示预测和实际情况，b、w 和 h 分别是相应边界框的中心、宽度和高度。参数 c 是包围边界框的对角线（灰色虚线矩形）而ρ是欧氏距离。代码如下：

cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1) # convex (smallest enclosing box) width
ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1) # convex height
if CIoU or DIoU: # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
c2 = cw ** 2 + ch ** 2 + eps # convex diagonal squared
rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4 # center dist ** 2
if CIoU: # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)
with torch.no_grad():
alpha = v / (v - iou + (1 + eps))
return iou - (rho2 / c2 + v * alpha) # CIoU

环境配置

首先在 conda 环境中创建并激活 python3.11 版本

#创建 python11 虚拟环境 yolov13
conda create -n yolov13 python=3.11
#激活 yolov13 虚拟环境
conda activate yolov13

然后安装 torch，根据本地 cuda 版本选择合适的 torch 版本安装，本文安装的是 torch2.7.1+cu128 版本，在 torch 官网 Previous PyTorch Versions 选择合适版本的命令进行安装

pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128

安装 flash-attn，因为 flash-attn 在 windows 中的兼容性问题，本文使用的 flash-attn 为 github 中下载好直接安装，本文使用的 flash-attn 安装包如下方链接

flash_attn-2.7.4.post1+cu124torch2.4.0cxx11abiFALSE-cp311-cp311-win_amd64.whl

随后对 YOLOv13 模型包中的 requirements 依赖包进行安装

pip install -r requirements.txt

训练与推理

训练代码

import os
import multiprocessing
from multiprocessing import freeze_support
from torch.optim import AdamW
from ultralytics import YOLO

# 定义要测试的模型版本
yolo_versions = [
    "yolov13n.yaml"
]

#设置环境变量
os.environ['NUMEXPR_MAX_THREADS'] = '4'
os.environ['OMP_NUM_THREADS'] = '1'

def main():
    from ultralytics import YOLO
    model=YOLO('yolov13n.pt')
    model.train(
        data='coco.yaml',
        epochs=5,
        imgsz=640,
        batch =32,
        device=0, # 使用 GPU 0
        workers=1,
        cache=True,
        lr0 = 0.01,
        lrf = 0.1,
        optimizer='AdamW',
        close_mosaic=0,
        rect=True,
        save_period=1,
        pretrained=False, # 不加载预训练模型
        project='runs/train', # 设定保存结果的文件夹路径
        name='exp', # 每个版本单独存储
    )

if __name__=='__main__':
    multiprocessing.set_start_method('spawn', force=True)
    freeze_support()
    main()
# # 遍历每个模型版本，进行训练
# for version in yolo_versions:
#     print(f"正在训练模型：{version}")
#     # 加载 YOLO 模型，不加载预训练权重
#     model = YOLO(version)
#     # 训练模型
#     results = model.train(
#         data="coco.yaml",
#         epochs=2,
#         imgsz=640,
#         batch =16,
#         device=[0], # 使用 GPU 0
#         workers=1,
#         cache=True,
#         lr0 = 0.001,
#         lrf = 0.1,
#         optimizer='AdamW',
#         close_mosaic=0,
#         rect=True,
#         save_period=1,
#         pretrained=False, # 这里不加载预训练模型
#         project='runs/train', # 设定保存结果的文件夹路径
#         name='exp', # 每个版本单独存储
#     )
#     print(f"✅ 模型 {version} 训练完成，结果已保存！\n\n")

推理（静态图像）代码

# -*- coding: utf-8 -*-
from ultralytics import YOLO
import os
import glob

def save_coordinates_to_txt(boxes, model_names, output_path, image_width, image_height, normalized=False):
    """ 将坐标信息保存到 txt 文件
    参数:
        boxes: 检测到的边界框
        model_names: 模型类别名称
        output_path: 输出文件路径
        image_width: 图像宽度
        image_height: 图像高度
        normalized: 是否保存归一化坐标（YOLO 格式）
    """
    with open(output_path, 'w', encoding='utf-8') as f:
        f.write("检测结果坐标信息\n")
        f.write("=" * 50 + "\n")
        f.write(f"图像尺寸：{image_width} x {image_height}\n")
        f.write("=" * 50 + "\n")
        for i, box in enumerate(boxes):
            # 获取坐标和类别信息
            x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
            class_id = int(box.cls[0].cpu().numpy())
            class_name = model_names[class_id]
            # 转换为整数坐标
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
            center_x = (x1 + x2) // 2
            center_y = (y1 + y2) // 2
            width = x2 - x1
            height = y2 - y1
            # 写入文件
            f.write(f"目标 {i+1}: {class_name}\n")
            f.write(f" 边界框坐标：({x1}, {y1}) - ({x2}, {y2})\n")
            f.write(f" 中心点坐标：({center_x}, {center_y})\n")
            f.write(f" 尺寸：{width} x {height} 像素\n")
            # 如果需要归一化坐标（YOLO 训练格式）
            if normalized:
                x_center_norm = format(center_x/image_width, '.6f')
                y_center_norm = format(center_y/image_height, '.6f')
                width_norm = format(width/image_width, '.6f')
                height_norm = format(height/image_height, '.6f')
                f.write(f" 归一化坐标：{class_id} {x_center_norm} {y_center_norm} {width_norm} {height_norm}\n")
            f.write("-" * 40 + "\n")
        f.write(f"\n总计检测到 {len(boxes)} 个目标\n")

def find_latest_detection_dir(base_dir='runs/detect'):
    """ 查找最新的检测结果目录 """
    # 获取所有以 exp 开头的目录
    exp_dirs = glob.glob(os.path.join(base_dir, 'exp*'))
    if not exp_dirs:
        return os.path.join(base_dir, 'exp')
    # 按创建时间排序，获取最新的目录
    latest_dir = max(exp_dirs, key=os.path.getctime)
    return latest_dir

# Load a model
model = YOLO(model=r'path/of/your/best/weights/best.pt')
source = r'path/of/your/images/test0'
# 执行预测
results = model.predict(
    source, device=0, save=True, show=False, project='runs/detect', name='exp')

# 在处理完成后添加坐标输出
print("\n" + "="*50)
print("检测结果坐标信息:")
print("="*50)
# 查找最新的检测结果目录
detect_result_dir = find_latest_detection_dir()
print(f"检测结果目录：{detect_result_dir}")
# 处理每个结果
for i, result in enumerate(results):
    if hasattr(result, 'boxes') and result.boxes is not None:
        # 获取图像尺寸
        img_height, img_width = result.orig_shape
        # 从源文件路径提取文件名（不含扩展名）
        source_filename = os.path.basename(source)
        filename_without_ext = os.path.splitext(source_filename)[0]
        # 构建坐标文件路径 - 保存在检测结果目录下
        output_path = os.path.join(detect_result_dir, f"{filename_without_ext}_coordinates.txt")
        # 保存坐标到 txt 文件
        save_coordinates_to_txt(result.boxes, model.names, output_path, img_width, img_height)
        print(f"坐标信息保存至：{output_path}")
        # 在控制台输出坐标信息
        boxes = result.boxes
        for j in range(len(boxes)):
            # 获取坐标信息
            x1, y1, x2, y2 = boxes.xyxy[j].cpu().numpy()
            conf = boxes.conf[j].cpu().numpy()
            cls_id = int(boxes.cls[j].cpu().numpy())
            cls_name = model.names[cls_id]
            # 转换为整数
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
            print(f"图 {i+1} - 目标 {j+1}: {cls_name}")
            print(f" 位置：[{x1}, {y1}, {x2}, {y2}]")
            print(f" 中心点：[{(x1+x2)//2}, {(x1+y2)//2}]")
            print(f" 置信度：{conf:.3f}")
            print("-" * 30)

推理（视频）代码

# -*- coding: utf-8 -*-
from ultralytics import YOLO
import cv2
import os

def save_coordinates_to_txt(boxes, model_names, output_path, frame_number, image_width, image_height):
    """ 将坐标信息保存到 txt 文件 """
    with open(output_path, 'a', encoding='utf-8') as f: # 使用追加模式
        f.write(f"帧号：{frame_number}\n")
        f.write(f"图像尺寸：{image_width} x {image_height}\n")
        f.write("-" * 40 + "\n")
        for i, box in enumerate(boxes):
            # 获取坐标和类别信息
            x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
            class_id = int(box.cls[0].cpu().numpy())
            class_name = model_names[class_id]
            confidence = box.conf[0].cpu().numpy()
            # 转换为整数坐标
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
            center_x = (x1 + x2) // 2
            center_y = (y1 + y2) // 2
            width = x2 - x1
            height = y2 - y1
            # 写入文件
            f.write(f"目标 {i+1}: {class_name}\n")
            f.write(f" 边界框坐标：({x1}, {y1}) - ({x2}, {y2})\n")
            f.write(f" 中心点坐标：({center_x}, {center_y})\n")
            f.write(f" 尺寸：{width} x {height} 像素\n")
            f.write(f" 置信度：{confidence:.4f}\n")
            f.write("-" * 30 + "\n")
        f.write(f"本帧总计检测到 {len(boxes)} 个目标\n\n")

# 加载模型
model = YOLO(model=r'path/of/your/best/weights/best.pt')
# 视频路径
video_path = r'path/of/your/videos/1.mp4' # 请替换为你的视频路径
# 打开视频文件
cap = cv2.VideoCapture(video_path)
assert cap.isOpened(), "错误：无法读取视频文件"
# 获取视频属性
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print(f"视频信息：{width}x{height}, {fps} FPS, 总帧数：{total_frames}")
# 创建输出目录
output_dir = 'runs/detect/video_exp'
os.makedirs(output_dir, exist_ok=True)
# 准备输出视频和坐标文件
output_video_path = os.path.join(output_dir, 'output_video.mp4')
output_txt_path = os.path.join(output_dir, 'video_coordinates.txt')
video_writer = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))
# 清空或创建坐标文件
with open(output_txt_path, 'w', encoding='utf-8') as f:
    f.write("视频检测坐标信息\n")
    f.write("=" * 50 + "\n\n")
frame_count = 0
# 处理视频帧
while cap.isOpened():
    success, frame = cap.read()
    if not success:
        print("视频处理完成或视频帧为空")
        break
    frame_count += 1
    print(f"处理帧 {frame_count}/{total_frames}")
    # 使用 YOLO 进行预测
    results = model.predict(frame, save=False, verbose=False)
    result = results[0]
    if result.boxes is not None and len(result.boxes) > 0:
        # 保存坐标信息
        save_coordinates_to_txt(result.boxes, model.names, output_txt_path, frame_count, width, height)
        # 在帧上绘制检测结果
        annotated_frame = result.plot()
        # 写入输出视频
        video_writer.write(annotated_frame)
        # 可选：显示实时结果（按 Q 退出）
        cv2.imshow('视频检测', annotated_frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    else:
        video_writer.write(frame)
# 释放资源
cap.release()
video_writer.release()
cv2.destroyAllWindows()

实时推理（摄像头）代码

# -*- coding: utf-8 -*-
import torch
import os
import cv2
import time
from ultralytics import YOLO

# 设置 GPU
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
# 检查 GPU 可用性
if not torch.cuda.is_available():
    raise RuntimeError("CUDA 不可用，无法使用 GPU 进行检测")
# 设置 GPU 设备
torch.cuda.set_device(0)
device = torch.device('cuda:0')
print(f"使用设备：{device}")
print(f"GPU 名称：{torch.cuda.get_device_name(0)}")

def save_frame_coordinates(boxes, model_names, output_path, frame_number, image_width, image_height):
    """保存单帧坐标信息到 txt 文件"""
    with open(output_path, 'a', encoding='utf-8') as f: # 使用追加模式
        f.write(f"帧号：{frame_number}\n")
        f.write(f"时间戳：{time.strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"图像尺寸：{image_width} x {image_height}\n")
        f.write("-" * 50 + "\n")
        if boxes is not None and len(boxes) > 0:
            for i, box in enumerate(boxes):
                x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
                class_id = int(box.cls[0].cpu().numpy())
                class_name = model_names[class_id]
                confidence = box.conf[0].cpu().numpy()
                x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
                center_x = (x1 + x2) // 2
                center_y = (y1 + y2) // 2
                width = x2 - x1
                height = y2 - y1
                f.write(f"目标 {i+1}: {class_name}\n")
                f.write(f" 边界框坐标：({x1}, {y1}) - ({x2}, {y2})\n")
                f.write(f" 中心点坐标：({center_x}, {center_y})\n")
                f.write(f" 尺寸：{width} x {height} 像素\n")
                f.write(f" 置信度：{confidence:.4f}\n")
                f.write("-" * 30 + "\n")
            f.write(f"本帧总计检测到 {len(boxes)} 个目标\n\n")
        else:
            f.write("未检测到任何目标\n\n")

def camera_detection():
    """摄像头实时检测"""
    # 加载模型到 GPU
    print("正在加载 YOLO 模型到 GPU...")
    model = YOLO(model=r'path/of/your/best/weights/best.pt')
    model.model.to(device) # 验证模型是否在 GPU 上
    model_device = next(model.model.parameters()).device
    print(f"模型所在设备：{model_device}")
    # 打开摄像头
cap = cv2.VideoCapture(0) # 0 表示默认摄像头
if not cap.isOpened():
    print("无法打开摄像头")
    return
print("摄像头已打开，开始检测...")
print("按 'q' 键退出检测")
print("按 's' 键保存当前帧的坐标")
# 创建输出目录
output_dir = 'runs/detect/camera'
os.makedirs(output_dir, exist_ok=True)
# 坐标文件路径
coordinates_file = os.path.join(output_dir, 'camera_coordinates.txt')
# 清空或创建坐标文件
with open(coordinates_file, 'w', encoding='utf-8') as f:
    f.write("摄像头检测坐标信息\n")
    f.write("=" * 60 + "\n")
    f.write(f"开始时间：{time.strftime('%Y-%m-%d %H:%M:%S')}\n")
    f.write("=" * 60 + "\n\n")
frame_count = 0
save_requested = False
fps_counter = 0
start_time = time.time()
try:
    while True:
        # 读取帧
        ret, frame = cap.read()
        if not ret:
            print("无法从摄像头读取帧")
            break
        frame_count += 1
        fps_counter += 1
        # 计算 FPS
        if time.time() - start_time >= 1.0:
            fps = fps_counter / (time.time() - start_time)
            print(f"处理帧：{frame_count}, FPS: {fps:.2f}", end='\r')
            fps_counter = 0
            start_time = time.time()
        # 使用 YOLO 进行检测
        results = model(frame, device=0, # 使用 GPU
                        half=True, # FP16 加速
                        verbose=False, # 不显示进度
                        conf=0.5, # 置信度阈值
                        iou=0.45) # IOU 阈值
        result = results[0]
        # 如果请求保存坐标，则保存当前帧的坐标
        if save_requested:
            if hasattr(result, 'boxes') and result.boxes is not None and len(result.boxes) > 0:
                img_height, img_width = frame.shape[:2]
                save_frame_coordinates(result.boxes, model.names, coordinates_file, frame_count, img_width, img_height)
                print(f"\n第 {frame_count} 帧坐标已保存")
            else:
                print(f"\n第 {frame_count} 帧未检测到目标")
            save_requested = False
        # 检查按键
        key = cv2.waitKey(1) & 0xFF
        if key == ord('q'): # 按 q 退出
            break
        elif key == ord('s'): # 按 s 保存当前帧坐标
            save_requested = True
except KeyboardInterrupt:
    print("\n检测被用户中断")
finally:
    # 释放资源
cap.release()
cv2.destroyAllWindows()
# 记录结束时间
with open(coordinates_file, 'a', encoding='utf-8') as f:
    f.write("\n" + "=" * 60 + "\n")
    f.write(f"结束时间：{time.strftime('%Y-%m-%d %H:%M:%S')}\n")
    f.write(f"总处理帧数：{frame_count}\n")
    f.write("=" * 60 + "\n")

if __name__ == "__main__":
    camera_detection()