基于 ONNX Runtime 的 YOLOv8 高性能 C++ 推理实现

综述由AI生成本项目基于 ONNX Runtime 和 OpenCV，实现了 YOLOv8 的 C++ 推理模块。支持 CPU 与 CUDA GPU 加速，无需依赖 PyTorch。核心流程包括模型加载、图像预处理（RGB 转换、Letterbox/CenterCrop）、张量构建、推理执行及后处理（NMS 或分类输出）。代码涵盖会话创建、预热优化、FP32/FP16 精度支持及内存管理细节。环境配置需安装 OpenCV 与 ONNX Runtime，GPU 模式需额外配置 CUDA 及系统 DLL 依赖。

CloudNative发布于 2026/3/15更新于 2026/4/2613 浏览

项目背景

本项目基于 ONNX Runtime 和 OpenCV，实现了一个轻量、高效、可扩展的 YOLOv8 C++ 推理模块。它不依赖 PyTorch，可直接加载 .onnx 模型进行推理，适用于 Windows/Linux 平台，支持 CPU 与 CUDA 加速。

项目有三个文件：inference.h，inference.cpp 和 main.cpp，核心文件为 inference.cpp。

文章配图

代码讲解

1. inference.cpp 注释版代码：

// Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license 
#define _CRT_SECURE_NO_WARNINGS 1 // 关闭 MSVC 下部分 C 运行库的'安全'警告（如 strcpy 等函数）
#include "inference.h"
#include <regex>
#define benchmark // 打开后会进行简单的时间统计（前处理/推理/后处理耗时打印）
#define min(a,b) (((a) < (b)) ? (a) : (b)) // 自定义 min 宏（注意：可能与 std::min 冲突，项目里保持原样）

YOLO_V8::YOLO_V8() { }
YOLO_V8::~YOLO_V8() { delete session; // 析构时释放 ONNX Runtime 的 Session（注意：input/output 节点名里 new 的 char* 未释放，存在内存泄露风险） }

#ifdef USE_CUDA
namespace Ort {
    // 当使用 CUDA 且输入为 half（fp16）时，告知 ORT 该模板类型映射为 ONNX 的 FLOAT16
    template<>
    struct TypeToTensorType<half> {
        static constexpr ONNXTensorElementDataType type = ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16;
    };
}
#endif



{
     channels = iImg.();
     imgHeight = iImg.rows;
     imgWidth = iImg.cols;
    
     ( c = ; c < channels; c++) {
         ( h = ; h < imgHeight; h++) {
             ( w = ; w < imgWidth; w++) {
                
                iBlob[c * imgWidth * imgHeight + h * imgWidth + w] =  std::remove_pointer<T>::((iImg.<cv::Vec3b>(h, w)[c]) / );
            }
        }
    }
     RET_OK; 
}




{
    
     (iImg.() == ) {
        oImg = iImg.();
        cv::(oImg, oImg, cv::COLOR_BGR2RGB); 
    }  {
        cv::(iImg, oImg, cv::COLOR_GRAY2RGB); 
    }

     (modelType) {
        
         YOLO_DETECT_V8:
         YOLO_POSE:
         YOLO_DETECT_V8_HALF:
         YOLO_POSE_V8_HALF: {
            
             (iImg.cols >= iImg.rows) {
                
                
                resizeScales = iImg.cols / ()iImgSize.();
                cv::(oImg, oImg, cv::(iImgSize.(), (iImg.rows / resizeScales)));
            }  {
                
                resizeScales = iImg.rows / ()iImgSize.();
                cv::(oImg, oImg, cv::((iImg.cols / resizeScales), iImgSize.()));
            }
            
            
            cv::Mat tempImg = cv::Mat::(iImgSize.(), iImgSize.(), CV_8UC3);
            
            oImg.((cv::(, , oImg.cols, oImg.rows)));
            oImg = tempImg;
            ;
        }
         YOLO_CLS: {
            
            
             h = iImg.rows;
             w = iImg.cols;
             m = (h, w); 
             top = (h - m) / ;
             left = (w - m) / ;
            cv::((cv::(left, top, m, m)), oImg, cv::(iImgSize.(), iImgSize.()));
            ;
        }
    }
     RET_OK;
}


{
    * Ret = RET_OK;
    ; 
     result = std::(iParams.modelPath, pattern);
     (result) {
        
        Ret = (*);
        std::cout << Ret << std::endl;
         Ret;
    }
     {
        
        rectConfidenceThreshold = iParams.rectConfidenceThreshold;
        iouThreshold = iParams.iouThreshold;
        imgSize = iParams.imgSize;
        modelType = iParams.modelType;
        cudaEnable = iParams.cudaEnable;

        
        env = Ort::(ORT_LOGGING_LEVEL_WARNING, );

        
        Ort::SessionOptions sessionOption;
         (iParams.cudaEnable) {
            
            OrtCUDAProviderOptions cudaOption;
            cudaOption.device_id = ;
            sessionOption.(cudaOption);
        }

        
        sessionOption.(GraphOptimizationLevel::ORT_ENABLE_ALL);

        
        sessionOption.(iParams.intraOpNumThreads);

        
        sessionOption.(iParams.logSeverityLevel);


        
         ModelPathSize = (CP_UTF8, , iParams.modelPath.(), <>(iParams.modelPath.()), , );
        * wide_cstr =  [ModelPathSize + ];
        (CP_UTF8, , iParams.modelPath.(), <>(iParams.modelPath.()), wide_cstr, ModelPathSize);
        wide_cstr[ModelPathSize] = ;
         * modelPath = wide_cstr;

        
         * modelPath = iParams.modelPath.();


        
        session =  Ort::(env, modelPath, sessionOption);

        
        Ort::AllocatorWithDefaultOptions allocator;
         inputNodesNum = session->();
         ( i = ; i < inputNodesNum; i++) {
            Ort::AllocatedStringPtr input_node_name = session->(i, allocator);
            * temp_buf =  []; 
            (temp_buf, input_node_name.());
            inputNodeNames.(temp_buf);
        }
         OutputNodesNum = session->();
         ( i = ; i < OutputNodesNum; i++) {
            Ort::AllocatedStringPtr output_node_name = session->(i, allocator);
            * temp_buf =  []; 
            (temp_buf, output_node_name.());
            outputNodeNames.(temp_buf);
        }

        
        options = Ort::RunOptions{  };

        
        ();
         RET_OK;
    }  ( std::exception& e) {
        
         * str1 = ;
         * str2 = e.();
        std::string result = std::(str1) + std::(str2);
        * merged =  [result.() + ];
        (merged, result.());
        std::cout << merged << std::endl;
        [] merged;
         (*);
    }
}



{

     starttime_1 = (); 

    * Ret = RET_OK;
    cv::Mat processedImg;
    (iImg, imgSize, processedImg); 

     (modelType < ) {
        
        * blob =  [processedImg.() * ];
        (processedImg, blob); 
        
        std::vector<> inputNodeDims = { , , imgSize.(), imgSize.() };
        (starttime_1, iImg, blob, inputNodeDims, oResult); 
    }  {

        
        half* blob =  half[processedImg.() * ];
        (processedImg, blob);
        std::vector<> inputNodeDims = { ,,imgSize.(),imgSize.() };
        (starttime_1, iImg, blob, inputNodeDims, oResult);

    }
     Ret;
}



{
    
    Ort::Value inputTensor = Ort::Value::CreateTensor< std::remove_pointer<N>::type>(
        Ort::MemoryInfo::(OrtDeviceAllocator, OrtMemTypeCPU), blob,  * imgSize.() * imgSize.(), inputNodeDims.(), inputNodeDims.());


     starttime_2 = (); 


    
     outputTensor = session->(options, inputNodeNames.(), &inputTensor, , outputNodeNames.(), outputNodeNames.());


     starttime_3 = (); 


    
    Ort::TypeInfo typeInfo = outputTensor.().();
     tensor_info = typeInfo.();
    std::vector<> outputNodeDims = tensor_info.();

    
     output = outputTensor.().GetTensorMutableData< std::remove_pointer<N>::type>();
    [] blob; 

     (modelType) {
     YOLO_DETECT_V8:
     YOLO_DETECT_V8_HALF: {
        
        
         signalResultNum = outputNodeDims[]; 
         strideNum = outputNodeDims[]; 
        std::vector<> class_ids;
        std::vector<> confidences;
        std::vector<cv::Rect> boxes;
        cv::Mat rawData;
         (modelType == YOLO_DETECT_V8) {
            
            rawData = cv::(signalResultNum, strideNum, CV_32F, output);
        }  {
            
            rawData = cv::(signalResultNum, strideNum, CV_16F, output);
            rawData.(rawData, CV_32F);
        }

        
        
        
        rawData = rawData.(); 

        * data = (*)rawData.data;
        
         ( i = ; i < strideNum; ++i) {
            
            * classesScores = data + ;
            
            ;
            cv::Point class_id;
             maxClassScore;
            cv::(scores, , &maxClassScore, , &class_id);

            
             (maxClassScore > rectConfidenceThreshold) {
                confidences.(maxClassScore);
                class_ids.(class_id.x);
                
                 x = data[];
                 y = data[];
                 w = data[];
                 h = data[];
                
                
                 left = ((x -  * w) * resizeScales);
                 top = ((y -  * h) * resizeScales);
                 width = (w * resizeScales);
                 height = (h * resizeScales);
                boxes.(cv::(left, top, width, height));
            }
            
            data += signalResultNum;
        }

        
        std::vector<> nmsResult;
        cv::dnn::(boxes, confidences, rectConfidenceThreshold, iouThreshold, nmsResult);

        
         ( i = ; i < nmsResult.(); ++i) {
             idx = nmsResult[i];
            DL_RESULT result;
            result.classId = class_ids[idx];
            result.confidence = confidences[idx];
            result.box = boxes[idx];
            oResult.(result);
        }


        
         starttime_4 = ();
         pre_process_time = ()(starttime_2 - starttime_1) / CLOCKS_PER_SEC * ;
         process_time = ()(starttime_3 - starttime_2) / CLOCKS_PER_SEC * ;
         post_process_time = ()(starttime_4 - starttime_3) / CLOCKS_PER_SEC * ;
         (cudaEnable) {
            std::cout <<  << pre_process_time <<  << process_time <<  << post_process_time <<  << std::endl;
        }  {
            std::cout <<  << pre_process_time <<  << process_time <<  << post_process_time <<  << std::endl;
        }

        ;
    }
     YOLO_CLS:
     YOLO_CLS_HALF: {
        
        cv::Mat rawData;
         (modelType == YOLO_CLS) {
            
            rawData = cv::(, ->classes.(), CV_32F, output);
        }  {
            
            rawData = cv::(, ->classes.(), CV_16F, output);
            rawData.(rawData, CV_32F);
        }
        * data = (*)rawData.data;
        
        DL_RESULT result;
         ( i = ; i < ->classes.(); i++) {
            result.classId = i;
            result.confidence = data[i];
            oResult.(result);
        }
        ;
    }
    :
        
        std::cout <<  <<  << std::endl;
    }
     RET_OK;
}


{
     starttime_1 = ();
    
    
    cv::Mat iImg = cv::(cv::(imgSize.(), imgSize.()), CV_8UC3);
    cv::Mat processedImg;
    (iImg, imgSize, processedImg);

     (modelType < ) {
        
        * blob =  [iImg.() * ];
        (processedImg, blob);
        std::vector<> YOLO_input_node_dims = { , , imgSize.(), imgSize.() };
        Ort::Value input_tensor = Ort::Value::<>(
            Ort::MemoryInfo::(OrtDeviceAllocator, OrtMemTypeCPU), blob,  * imgSize.() * imgSize.(), YOLO_input_node_dims.(), YOLO_input_node_dims.());
         output_tensors = session->(options, inputNodeNames.(), &input_tensor, , outputNodeNames.(), outputNodeNames.());
        [] blob;
         starttime_4 = ();
         post_process_time = ()(starttime_4 - starttime_1) / CLOCKS_PER_SEC * ;
         (cudaEnable) {
            std::cout <<  <<  << post_process_time <<  << std::endl;
        }
    }  {

        
        half* blob =  half[iImg.() * ];
        (processedImg, blob);
        std::vector<> YOLO_input_node_dims = { ,,imgSize.(),imgSize.() };
        Ort::Value input_tensor = Ort::Value::<half>(Ort::MemoryInfo::(OrtDeviceAllocator, OrtMemTypeCPU), blob,  * imgSize.() * imgSize.(), YOLO_input_node_dims.(), YOLO_input_node_dims.());
         output_tensors = session->(options, inputNodeNames.(), &input_tensor, , outputNodeNames.(), outputNodeNames.());
        [] blob;
         starttime_4 = ();
         post_process_time = ()(starttime_4 - starttime_1) / CLOCKS_PER_SEC * ;
         (cudaEnable) {
            std::cout <<  <<  << post_process_time <<  << std::endl;
        }

    }
     RET_OK;
}

// Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license #include <iostream> // 标准输入输出，用于打印日志/提示 #include <iomanip> // 控制浮点输出格式（setprecision 等） #include "inference.h"// 本项目的推理类 YOLO_V8 的声明 #include <filesystem> // C++17 文件系统库，遍历目录读取图片 #include <fstream> // 读写文件（用于读取 coco.yaml） #include <random> // 随机数（生成随机颜色等） // ------------------------------- // Detector：目标检测的演示函数 // 参数 p 为 YOLO_V8* 的引用（YOLO_V8*&），保留'能在函数内修改指针本身'的能力 // 功能：遍历工作目录下的 ./images/ 文件夹，逐张图片执行 RunSession，绘制检测框与标签并显示 // ------------------------------- void Detector(YOLO_V8*& p) { std::filesystem::path current_path = std::filesystem::current_path(); // 当前工作目录 std::filesystem::path imgs_path = current_path / "images"; // 约定图片放在 ./images/ 目录 for (auto& i : std::filesystem::directory_iterator(imgs_path)) // 遍历目录下所有文件 { // 仅处理常见的位图格式 if (i.path().extension() == ".jpg" || i.path().extension() == ".png" || i.path().extension() == ".jpeg") { std::string img_path = i.path().string(); // 完整路径字符串 cv::Mat img = cv::imread(img_path); // OpenCV 读图（BGR） std::vector<DL_RESULT> res; // 存放推理结果（多个目标） p->RunSession(img, res); // 核心推理（前处理→推理→后处理） // 遍历本张图片的所有检测结果，绘制可视化 for (auto& re : res) { // 生成随机颜色：不同目标用不同颜色，便于区分 cv::RNG rng(cv::getTickCount()); cv::Scalar color(rng.uniform(0, 256), rng.uniform(0, 256), rng.uniform(0, 256)); // 在原图上画出目标框（左上角、右下角由 re.box 决定；线宽=3） cv::rectangle(img, re.box, color, 3); // 置信度格式化：保留两位小数 // floor(100*x)/100 是一种'截断到两位'的方式，后续 substr 仅为美观去掉多余字符 float confidence = floor(100 * re.confidence) / 100; std::cout << std::fixed << std::setprecision(2); // 控制 cout 的浮点显示为两位小数 std::string label = p->classes[re.classId] + " " + std::to_string(confidence).substr(0, std::to_string(confidence).size() - 4); // 上面 substr(... size()-4) 的小技巧：去掉 to_string 默认多余的位数（如 "0.50xxxx"） // 在框上方绘制一块实心矩形作为文字背景，避免文本与图像混淆 cv::rectangle( img, cv::Point(re.box.x, re.box.y - 25), cv::Point(re.box.x + label.length() * 15, re.box.y), color, cv::FILLED ); // 在背景矩形上绘制类别 + 置信度文本（黑字） cv::putText( img, label, cv::Point(re.box.x, re.box.y - 5), cv::FONT_HERSHEY_SIMPLEX, 0.75, cv::Scalar(0, 0, 0), 2 ); } // 显示当前图片的检测结果；等待任意键继续到下一张 std::cout << "Press any key to exit" << std::endl; cv::imshow("Result of Detection", img); cv::waitKey(0); cv::destroyAllWindows(); } } } // ------------------------------- // Classifier：分类任务的演示函数 // 功能：遍历当前目录下的图片，调用分类模型，直接把每个类别的分数写到图像上显示 // 说明：分类输出是'对每个类别的置信度'，此处简单地按序写出；可自行改为只显示 Top-K // ------------------------------- void Classifier(YOLO_V8*& p) { std::filesystem::path current_path = std::filesystem::current_path(); // 当前工作目录 std::filesystem::path imgs_path = current_path;// / "images" // 示例使用当前目录；也可改为 ./images // 为了使每一行分数显示不同颜色，准备一个 [0,255] 的均匀分布随机数生成器 std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<int> dis(0, 255); for (auto& i : std::filesystem::directory_iterator(imgs_path)) { if (i.path().extension() == ".jpg" || i.path().extension() == ".png") { std::string img_path = i.path().string(); //std::cout << img_path << std::endl; cv::Mat img = cv::imread(img_path); std::vector<DL_RESULT> res; // 分类结果：每个类别一条记录（classId, confidence） char* ret = p->RunSession(img, res); // 运行分类推理（FP32/FP16 由模型类型决定） // 逐行把每个类别的分数打印到图像上（从 y=50 开始，每行间距 50 像素） float positionY = 50; for (int i = 0; i < res.size(); i++) { int r = dis(gen); int g = dis(gen); int b = dis(gen); cv::putText(img, std::to_string(i) + ":", cv::Point(10, positionY), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(b, g, r), 2); cv::putText(img, std::to_string(res.at(i).confidence), cv::Point(70, positionY), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(b, g, r), 2); positionY += 50; } // 显示分类结果；按键关闭窗口 cv::imshow("TEST_CLS", img); cv::waitKey(0); cv::destroyAllWindows(); //cv::imwrite("E:\\output\\" + std::to_string(k) + ".png", img); // 可选：把结果保存到硬盘 } } } // ------------------------------- // ReadCocoYaml：从 coco.yaml 读取类别名到 p->classes // 假定 coco.yaml 中存在形如： // names: // 0: person // 1: bicycle // ... // 这种简单键值对列表。这里用最朴素的行扫描 + 字符串分割来解析。 // ------------------------------- int ReadCocoYaml(YOLO_V8*& p) { // Open the YAML file std::ifstream file("coco.yaml"); // 从当前工作目录读取 coco.yaml if (!file.is_open()) { std::cerr << "Failed to open file" << std::endl; return 1; } // Read the file line by line std::string line; std::vector<std::string> lines; while (std::getline(file, line)) { lines.push_back(line); // 全部行读入内存，后续扫描 } // Find the start and end of the names section // 思路：找到包含 "names:" 的行作为起点，再找到'下一段的起始'作为终点（简单根据冒号是否出现判断） std::size_t start = 0; std::size_t end = 0; for (std::size_t i = 0; i < lines.size(); i++) { if (lines[i].find("names:") != std::string::npos) { start = i + 1; // names: 的下一行起为数据起点 } else if (start > 0 && lines[i].find(':') == std::string::npos) { end = i; // 碰到不含冒号的行，认为 names 段结束（简化处理） break; } } // Extract the names // 将每行按冒号分割，取冒号后的字符串作为类别名（不去空白，按原样） std::vector<std::string> names; for (std::size_t i = start; i < end; i++) { std::stringstream ss(lines[i]); std::string name; std::getline(ss, name, ':'); // Extract the number before the delimiter // 左侧序号（丢弃） std::getline(ss, name); // Extract the string after the delimiter // 右侧名称（保留） names.push_back(name); } p->classes = names; // 写回 YOLO_V8 实例，供可视化使用（label 文本） return 0; } // ------------------------------- // DetectTest：检测 Demo 的入口 // 负责：创建 YOLO_V8 实例 → 设定类别名/参数 → CreateSession → 调用 Detector → 释放实例 // ------------------------------- void DetectTest() { YOLO_V8* yoloDetector = new YOLO_V8; // 动态创建（也可用智能指针，这里保持示例风格） //ReadCocoYaml(yoloDetector); // 可选：从 coco.yaml 读取 80 类 yoloDetector->classes = { "face" }; // 示例：仅一类'face'，便于测试人脸模型 DL_INIT_PARAM params; // 初始化推理参数（见 inference.h） params.rectConfidenceThreshold = 0.1; // 置信度阈值（较低，便于观察效果） params.iouThreshold = 0.5; // NMS 的 IOU 阈值 params.modelPath = "best.onnx"; // ONNX 模型路径（与可执行文件相对路径） params.imgSize = { 640, 640 }; // 模型输入分辨率（与导出模型一致） #ifdef USE_CUDA params.cudaEnable = true; // 启用 CUDA EP（前提：ORT 构建包含 CUDA） // GPU FP32 inference params.modelType = YOLO_DETECT_V8; // 使用 FP32 检测模型 // GPU FP16 inference //Note: change fp16 onnx model //params.modelType = YOLO_DETECT_V8_HALF; // 使用 FP16（需换成对应的 FP16 ONNX） #else // CPU inference params.modelType = YOLO_DETECT_V8; // CPU 版仍使用 FP32 模型 params.cudaEnable = false; // 关闭 CUDA #endif yoloDetector->CreateSession(params); // 创建 ORT 会话并预热，准备推理 Detector(yoloDetector); // 运行检测 Demo：遍历 ./images/ 并可视化 delete yoloDetector; // 释放实例（注意：当前实现中 I/O 节点名有内存泄露，示例不处理） } // ------------------------------- // ClsTest：分类 Demo 的入口 // 负责：创建实例 → 读取类别名 → 设定分类模型参数 → CreateSession → 调用 Classifier // ------------------------------- void ClsTest() { YOLO_V8* yoloDetector = new YOLO_V8; std::string model_path = "cls.onnx"; // 分类模型的 ONNX 路径 ReadCocoYaml(yoloDetector); // 从 coco.yaml 读取类别名（也可改成自定义） DL_INIT_PARAM params{ model_path, YOLO_CLS, {224, 224} }; // 简写的聚合初始化：路径、模型类型、输入尺寸 yoloDetector->CreateSession(params); // 创建会话（分类分支） Classifier(yoloDetector); // 遍历目录图片，叠加每类分数并显示 } // ------------------------------- // main：程序入口 // 默认跑检测 Demo；若需要跑分类，注释 DetectTest 并打开 ClsTest 即可 // ------------------------------- int main() { DetectTest(); //ClsTest(); return 0; }

基于 ONNX Runtime 的 YOLOv8 高性能 C++ 推理实现

项目背景

代码讲解

1. inference.cpp 注释版代码：

2. inference.cpp 代码框架讲解：

（1）整体思路

（2）文件头与杂项

（3）BlobFromImage（Mat → NCHW 浮点数组）

（4）PreProcess（根据模型类型做图像预处理）

（5）CreateSession（会话创建与参数）

（6）RunSession（一次完整推理）

（7）TensorProcess（核心：Run + 解码输出）

（8）WarmUpSession（预热）

（9）关键参数/结构

3. inference.h 代码：

4. main.cpp 注释版代码：

环境配置

1. CPU 推理环境

2. GPU 推理环境（CUDA 加速）

更多推荐文章

相关免费在线工具

基于 ONNX Runtime 的 YOLOv8 高性能 C++ 推理实现

项目背景

代码讲解

1. inference.cpp 注释版代码：

2. inference.cpp 代码框架讲解：

（1）整体思路

（2）文件头与杂项

（3）BlobFromImage（Mat → NCHW 浮点数组）

（4）PreProcess（根据模型类型做图像预处理）

（5）CreateSession（会话创建与参数）

（6）RunSession（一次完整推理）

（7）TensorProcess（核心：Run + 解码输出）

（8）WarmUpSession（预热）

（9）关键参数/结构

3. inference.h 代码：

4. main.cpp 注释版代码：

环境配置

1. CPU 推理环境

2. GPU 推理环境（CUDA 加速）

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具