基于图像的遮挡物体检测,遮挡检测算法,防遮挡算法

基于图像的遮挡物体检测,遮挡检测算法,防遮挡算法

文章目录


1 前言

在某项目中遇到以下场景,即需要判断摄像头拍摄到的画面的特定区域是否被障碍物“遮挡”,并且不能将环境光变化的情况误识别成“被遮挡”状态。先来看需要实现的最终效果。

遮挡检测算法效果演示视频

如下图所示状态为“未被遮挡”状态下的场景画面,图中用红色方形框线框选的区域则是重点监测的区域,对应为“区域一”、“区域二”、“区域三”以及“区域四”。如图像右侧区域所示,对应的是四个区域的图像以及该区域是否被遮挡的状态,如果该区域没有被遮挡,则“是否遮挡”显示为“绿色的否”,如果该区域被遮挡,则显示为“红色的是”。

未被遮挡场景


如下图所示状态为“被遮挡”状态,每个区域均显示“是”。

四个区域均遮挡


如下图所示状态为遮挡了四个区域中的两个,其他两个区域未被遮挡。被遮挡区域显示为红色,未被遮挡区域显示为绿色。

只遮挡两个区域


此外,即使环境光发生变化,也不会将未遮挡的状态误识别为遮挡。

环境光比较暗的状态


环境光比较亮的状态

2 项目内容详细说明

2.0 项目实现流程详解

本项目的具体实现、操作流程如下图所示。

在这里插入图片描述


本项目为了实现识别“某处是否被遮挡”,其核心内容即进行分类判断,即Classify的过程,本案例中可以用一个5分类模型,即分别判断“区域1~4”的未遮挡状态,以及一个“遮挡状态”,也就是说在数据集制作阶段,需要分别获取以下状态的图像数据。

{0:"被遮挡状态",1:"区域1未遮挡状态",2:"区域2未遮挡状态",3:"区域3未遮挡状态",4:"区域4未遮挡状态"}

在训练得到分类模型后,输入待识别图像,得出具体分类数字,便能得出当前图像所处的状态。
实现的步骤如图所示主要有6步骤:
步骤一:分类数据集制作,即将所要识别区域的图片所有状态制作成数据集,如下图所示

2.1 实时显示软件

第1章所示的“实时软件”,为实时显示当前摄像头的画面以及各个区域是否被遮挡的状态。该软件通过Qt实现。工程文件夹下内容如下图所示。

在这里插入图片描述


项目结构为一个主窗口下,调用分类模型实例ONNXClassifier。通过onnxclassifier.h、onnxclassifier.cpp实现。
以下是ONNXClassifier::predict函数的具体实现代码。

std::vector<float>ONNXClassifier::predict(const cv::Mat& image){if(!session_){throw std::runtime_error("Session not initialized. Call initialize() first.");}// 1. BGR -> RGB 转换 cv::Mat img_rgb; cv::cvtColor(image, img_rgb, cv::COLOR_BGR2RGB);// 2. 调整图像尺寸到 224x224 cv::Mat img_resized; cv::resize(img_rgb, img_resized, cv::Size(224,224));// 3. 转换为float32并归一化到 [0, 1] cv::Mat float_img; img_resized.convertTo(float_img, CV_32FC3,1.0/255.0);// 4. ImageNet标准化 - 关键步骤!// ImageNet mean and std valuesconstfloat mean[3]={0.485f,0.456f,0.406f};constfloat std[3]={0.229f,0.224f,0.225f};// 创建标准化后的图像 cv::Mat standardized_img = cv::Mat::zeros(float_img.size(), float_img.type());for(int y =0; y < float_img.rows; y++){for(int x =0; x < float_img.cols; x++){ cv::Vec3f pixel = float_img.at<cv::Vec3f>(y, x); cv::Vec3f standardized_pixel;for(int c =0; c <3; c++){ standardized_pixel[c]=(pixel[c]- mean[c])/ std[c];} standardized_img.at<cv::Vec3f>(y, x)= standardized_pixel;}}// 5. 转换为CHW格式 (1, 3, 224, 224) std::vector<cv::Mat> channels; cv::split(standardized_img, channels); std::vector<float> input_data; input_data.reserve(1*3*224*224);// 按通道顺序添加数据for(constauto& channel : channels){ input_data.insert(input_data.end(),(float*)channel.datastart,(float*)channel.dataend);}// 准备输入 Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu( OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault); Ort::Value input_tensor = Ort::Value::CreateTensor<float>( memory_info, input_data.data(), input_data.size(), input_shape_.data(), input_shape_.size());// 运行推理auto output_tensors = session_->Run( Ort::RunOptions{nullptr}, input_names_.data(),&input_tensor, input_names_.size(), output_names_.data(), output_names_.size());// 获取输出float* output_data = output_tensors[0].GetTensorMutableData<float>(); Ort::TensorTypeAndShapeInfo shape_info = output_tensors[0].GetTensorTypeAndShapeInfo(); std::vector<int64_t> output_shape = shape_info.GetShape();// 计算总元素数量 size_t total_elements =1;for(auto dim : output_shape){ total_elements *= dim;}return std::vector<float>(output_data, output_data + total_elements);}

具体实现代码请见第4章。

2.2 模型训练

本项目模型训练使用Yolo-Classfify模块进行训练,详细内容见,具体实现代码请见第4章。

3 详细代码

3.1 分类数据集制作

分类数据集,由“实时显示”软件实现,在widget.cpp内实现,代码如下。

/* save train pictures */ std::string filename_1 ="1/roi1_image_"+ std::to_string(i)+".jpg"; std::string filename_2 ="2/roi2_image_"+ std::to_string(i)+".jpg"; std::string filename_3 ="3/roi3_image_"+ std::to_string(i)+".jpg"; std::string filename_4 ="4/roi4_image_"+ std::to_string(i)+".jpg";bool saveSuccess = cv::imwrite(filename_1, resized_1);bool saveSuccess_2 = cv::imwrite(filename_2, resized_2);bool saveSuccess_3 = cv::imwrite(filename_3, resized_3);bool saveSuccess_4 = cv::imwrite(filename_4, resized_4);

具体的各区域图像如下图所示。

在这里插入图片描述

文件夹 0 中图像如下图所示,其对应状态为“被遮挡状态”。

在这里插入图片描述


文件夹 1 中图像如下图所示,其对应状态为“区域1未遮挡状态”。

在这里插入图片描述

文件夹 2 中图像如下图所示,其对应状态为“区域2未遮挡状态”。

在这里插入图片描述

文件夹 3 中图像如下图所示,其对应状态为“区域3未遮挡状态”。

在这里插入图片描述

文件夹 4 中图像如下图所示,其对应状态为“区域4未遮挡状态”。

在这里插入图片描述

上述每个文件夹内的图片数据至少100张起步,且要尽量包含各种工况(如不同光环境)下的具体图像。

3.2 PT格式分类模型训练

以下是train.py中的train函数具体内容,全部代码请见第4章。

需要按照分类数据文件夹地址修改代码

在这里插入图片描述
def train(opt, device):"""Trains a YOLOv5 model, managing datasets, model optimization, logging, and saving checkpoints."""init_seeds(opt.seed +1+ RANK, deterministic=True) save_dir, data, bs, epochs, nw, imgsz, pretrained =( opt.save_dir,Path(opt.data), opt.batch_size, opt.epochs,min(os.cpu_count()-1, opt.workers), opt.imgsz,str(opt.pretrained).lower()=="true",) cuda = device.type !="cpu"#Directories wdir = save_dir /"weights" wdir.mkdir(parents=True, exist_ok=True) # make dir last, best = wdir /"last.pt", wdir /"best.pt"#Save run settingsyaml_save(save_dir /"opt.yaml",vars(opt))#Logger logger =GenericLogger(opt=opt, console_logger=LOGGER)if RANK in {-1,0}else None #Download Dataset with torch_distributed_zero_first(LOCAL_RANK),WorkingDirectory(ROOT): data_dir = data if data.is_dir()else(DATASETS_DIR / data)ifnot data_dir.is_dir(): LOGGER.info(f"\nDataset not found ⚠️, missing path {data_dir}, attempting download...") t = time.time()ifstr(data)=="imagenet": subprocess.run(["bash",str(ROOT /"data/scripts/get_imagenet.sh")], shell=True, check=True)else: url = f"https://github.com/ultralytics/assets/releases/download/v0.0.0/{data}.zip"download(url, dir=data_dir.parent) s = f"Dataset download success ✅ ({time.time() - t:.1f}s), saved to {colorstr('bold', data_dir)}\n" LOGGER.info(s)#Dataloaders nc =len([x for x in(data_dir /"train").glob("*")if x.is_dir()]) # number of classes trainloader =create_classification_dataloader( path=data_dir /"train", imgsz=imgsz, batch_size=bs // WORLD_SIZE, augment=True, cache=opt.cache, rank=LOCAL_RANK, workers=nw,) test_dir = data_dir /"test"if(data_dir /"test").exists()else data_dir /"val" # data/test or data/val if RANK in {-1,0}: testloader =create_classification_dataloader( path=test_dir, imgsz=imgsz, batch_size=bs // WORLD_SIZE * 2, augment=False, cache=opt.cache, rank=-1, workers=nw,)#Model with torch_distributed_zero_first(LOCAL_RANK),WorkingDirectory(ROOT):ifPath(opt.model).is_file()or opt.model.endswith(".pt"): model =attempt_load(opt.model, device="cpu", fuse=False) elif opt.model in torchvision.models.__dict__: # TorchVision models i.e. resnet50, efficientnet_b0 model = torchvision.models.__dict__[opt.model](weights="IMAGENET1K_V1"if pretrained else None)else: m = hub.list("ultralytics/yolov5") # + hub.list('pytorch/vision') # models raise ModuleNotFoundError(f"--model {opt.model} not found. Available models are: \n"+"\n".join(m))ifisinstance(model, DetectionModel): LOGGER.warning("WARNING ⚠️ pass YOLOv5 classifier model with '-cls' suffix, i.e. '--model yolov5s-cls.pt'") model =ClassificationModel(model=model, nc=nc, cutoff=opt.cutoff or10) # convert to classification model reshape_classifier_output(model, nc) # update classcountfor m in model.modules():ifnot pretrained andhasattr(m,"reset_parameters"): m.reset_parameters()ifisinstance(m, torch.nn.Dropout)and opt.dropout is not None: m.p = opt.dropout # set dropout for p in model.parameters(): p.requires_grad = True # for training model = model.to(device)#Infoif RANK in {-1,0}: model.names = trainloader.dataset.classes # attach classnames model.transforms = testloader.dataset.torch_transforms # attach inference transforms model_info(model)if opt.verbose: LOGGER.info(model) images, labels =next(iter(trainloader)) file =imshow_cls(images[:25], labels[:25], names=model.names, f=save_dir /"train_images.jpg") logger.log_images(file, name="Train Examples") logger.log_graph(model, imgsz) # log model #Optimizer optimizer =smart_optimizer(model, opt.optimizer, opt.lr0, momentum=0.9, decay=opt.decay)#Scheduler lrf =0.01 # finallr(fraction of lr0)#lf= lambda x:((1+ math.cos(x * math.pi / epochs))/2)*(1- lrf)+ lrf # cosine def lf(x):"""Linear learning rate scheduler function, scaling learning rate from initial value to `lrf` over `epochs`."""return(1- x / epochs)*(1- lrf)+ lrf # linear scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)#scheduler= lr_scheduler.OneCycleLR(optimizer, max_lr=lr0, total_steps=epochs, pct_start=0.1,#final_div_factor=1/25/ lrf)#EMA ema =ModelEMA(model)if RANK in {-1,0}else None #DDP modeif cuda and RANK !=-1: model =smart_DDP(model)#Train t0 = time.time() criterion =smartCrossEntropyLoss(label_smoothing=opt.label_smoothing) # loss function best_fitness =0.0 scaler = amp.GradScaler(enabled=cuda) val = test_dir.stem # 'val'or'test' LOGGER.info( f"Image sizes {imgsz} train, {imgsz} test\n" f"Using {nw * WORLD_SIZE} dataloader workers\n" f"Logging results to {colorstr('bold', save_dir)}\n" f"Starting {opt.model} training on {data} dataset with {nc} classes for {epochs} epochs...\n\n" f"{'Epoch':>10}{'GPU_mem':>10}{'train_loss':>12}{f'{val}_loss':>12}{'top1_acc':>12}{'top5_acc':>12}")for epoch in range(epochs): # loop over the dataset multiple times tloss, vloss, fitness =0.0,0.0,0.0 # train loss, val loss, fitness model.train()if RANK !=-1: trainloader.sampler.set_epoch(epoch) pbar =enumerate(trainloader)if RANK in {-1,0}: pbar =tqdm(enumerate(trainloader), total=len(trainloader), bar_format=TQDM_BAR_FORMAT)for i,(images, labels) in pbar: # progress bar images, labels = images.to(device, non_blocking=True), labels.to(device)#Forward with amp.autocast(enabled=cuda): # stability issues when enabled loss =criterion(model(images), labels)#Backward scaler.scale(loss).backward()#Optimize scaler.unscale_(optimizer) # unscale gradients torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=10.0) # clip gradients scaler.step(optimizer) scaler.update() optimizer.zero_grad()if ema: ema.update(model)if RANK in {-1,0}:#Print tloss =(tloss * i + loss.item())/(i +1) # update mean losses mem ="%.3gG"%(torch.cuda.memory_reserved()/1e9if torch.cuda.is_available()else0) # (GB) pbar.desc = f"{f'{epoch + 1}/{epochs}':>10}{mem:>10}{tloss:>12.3g}"+" "*36#Testif i ==len(pbar)-1: # last batch top1, top5, vloss = validate.run( model=ema.ema, dataloader=testloader, criterion=criterion, pbar=pbar ) # test accuracy, loss fitness = top1 # define fitness as top1 accuracy #Scheduler scheduler.step()#Log metricsif RANK in {-1,0}:#Best fitnessif fitness > best_fitness: best_fitness = fitness #Log metrics ={"train/loss": tloss, f"{val}/loss": vloss,"metrics/accuracy_top1": top1,"metrics/accuracy_top5": top5,"lr/0": optimizer.param_groups[0]["lr"],} # learning rate logger.log_metrics(metrics, epoch)#Save model final_epoch = epoch +1== epochs if(not opt.nosave)or final_epoch: ckpt ={"epoch": epoch,"best_fitness": best_fitness,"model":deepcopy(ema.ema).half(), # deepcopy(de_parallel(model)).half(),"ema": None, # deepcopy(ema.ema).half(),"updates": ema.updates,"optimizer": None, # optimizer.state_dict(),"opt":vars(opt),"git": GIT_INFO, # {remote, branch, commit}if a git repo "date": datetime.now().isoformat(),}#Save last, best anddelete torch.save(ckpt, last)if best_fitness == fitness: torch.save(ckpt, best) del ckpt #Train completeif RANK in {-1,0}and final_epoch: LOGGER.info( f"\nTraining complete ({(time.time() - t0) / 3600:.3f} hours)" f"\nResults saved to {colorstr('bold', save_dir)}" f"\nPredict: python classify/predict.py --weights {best} --source im.jpg" f"\nValidate: python classify/val.py --weights {best} --data {data_dir}" f"\nExport: python export.py --weights {best} --include onnx" f"\nPyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'custom', '{best}')" f"\nVisualize: https://netron.app\n")#Plot examples images, labels =(x[:25]for x in next(iter(testloader))) # first 25 images and labels pred = torch.max(ema.ema(images.to(device)),1)[1] file =imshow_cls(images, labels, pred,de_parallel(model).names, verbose=False, f=save_dir /"test_images.jpg")#Log results meta ={"epochs": epochs,"top1_acc": best_fitness,"date": datetime.now().isoformat()} logger.log_images(file, name="Test Examples (true-predicted)", epoch=epoch) logger.log_model(best, epochs, metadata=meta)

3.3 PT格式分类模型测试

通过prdict.py文件进行PT格式模型测试。

3.4 ONNX格式分类模型转化

通过export.py文件将PT格式模型转换成ONNX格式模型。

3.5 调用ONNX模型测试

通过onnxpredict.py文件进行模型测试,以下是onnxpredict.py的具体实现。

importcv2importnumpy as np importonnxruntime as ort def softmax(x):"""计算softmax""" exp_x = np.exp(x - np.max(x)) # 防止数值溢出 return exp_x / exp_x.sum() onnx_path ="best.onnx" img_path ="2.jpg" # 读取图像 - BGR格式 img_bgr = cv2.imread(img_path)print(f"原始图像尺寸: {img_bgr.shape}") # 1. BGR -> RGB 转换 img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)print(f"BGR转RGB后: {img_rgb.shape}") # 2. 调整图像尺寸到 224x224 img_resized = cv2.resize(img_rgb,(224,224))print(f"调整后图像尺寸: {img_resized.shape}") # 3. 预处理流程 #a) 转换为float32并归一化到 [0,1] img_normalized = img_resized.astype(np.float32)/255.0print(f"归一化后范围: [{img_normalized.min():.3f}, {img_normalized.max():.3f}]")#b) ImageNet标准化 - 这是关键步骤! mean = np.array([0.485,0.456,0.406], dtype=np.float32) std = np.array([0.229,0.224,0.225], dtype=np.float32) img_standardized =(img_normalized - mean)/ std print(f"标准化后范围: [{img_standardized.min():.3f}, {img_standardized.max():.3f}]")#c) 改变维度顺序: HWC -> CHW img_chw = img_standardized.transpose(2,0,1)print(f"HWC->CHW后形状: {img_chw.shape}")#d) 添加批次维度: CHW -> NCHW img_processed = img_chw[np.newaxis,:,:,:]print(f"最终输入形状: {img_processed.shape}") # 创建推理会话 provider = ort.get_available_providers()[1if ort.get_device()=="GPU"else0] ort_session = ort.InferenceSession(onnx_path, providers=[provider]) # 首先检查输入输出节点的名称 input_name = ort_session.get_inputs()[0].name output_name = ort_session.get_outputs()[0].name print(f"输入节点名称: {input_name}")print(f"输出节点名称: {output_name}") # 使用正确的节点名称进行推理 results = ort_session.run( output_names=[output_name], input_feed={input_name: img_processed})print(f"输出形状: {results[0].shape}")print(f"原始输出(logits): {results[0][0]}") # 应用softmax得到概率 probabilities =softmax(results[0][0])print(f"Softmax概率: {probabilities}") # 找出预测类别 predicted_class = probabilities.argmax() confidence = probabilities[predicted_class]print(f"预测类别索引: {predicted_class}")print(f"置信度: {confidence:.4f}") # 根据类别索引显示结果 class_names =["0","1","2","3","4"] # 根据您的实际类别修改 print(f"预测结果: {class_names[predicted_class]} (置信度: {confidence:.4f})") # 显示所有类别的详细概率 for i,(logit, prob) in enumerate(zip(results[0][0], probabilities)):print(f"{class_names[i]}: logit={logit:.4f}, probability={prob:.4f}")

3.6 应用端集成

将onnx预测模型集成在实时软件中,以下是ONNXClassifier类的具体实现方法。

#ifndefONNXCLASSIFIER_H#defineONNXCLASSIFIER_H#include<string>#include<vector>#include<memory>#include<opencv2/opencv.hpp>#include<onnxruntime_cxx_api.h>classONNXClassifier{public:ONNXClassifier(const std::string& model_path,const std::vector<std::string>& class_names ={});~ONNXClassifier();boolinitialize(); std::vector<float>predict(const cv::Mat& image); std::vector<float>softmax(const std::vector<float>& logits);// 新增的封装方法structPredictionResult{int class_index;float confidence; std::string class_name; std::vector<float> logits; std::vector<float> probabilities;}; PredictionResult predictWithDetails(const cv::Mat& image);voidprintPredictionDetails(const PredictionResult& result)const;private: std::string model_path_; std::vector<std::string> class_names_; Ort::Env env_; std::unique_ptr<Ort::Session> session_; Ort::SessionOptions session_options_; Ort::AllocatorWithDefaultOptions allocator_; std::vector<constchar*> input_names_; std::vector<constchar*> output_names_; std::vector<int64_t> input_shape_;};#endif// ONNXCLASSIFIER_H
#include"onnxclassifier.h"#include<iostream>#include<algorithm>ONNXClassifier::ONNXClassifier(const std::string& model_path,const std::vector<std::string>& class_names):model_path_(model_path),class_names_(class_names),env_(ORT_LOGGING_LEVEL_WARNING,"ONNXClassifier"),session_options_(){// 如果没有提供类别名称,使用默认的if(class_names_.empty()){ class_names_ ={"0","1","2","3","4"};}}ONNXClassifier::~ONNXClassifier(){// 释放分配的名称内存for(auto name : input_names_){ allocator_.Free(const_cast<void*>(static_cast<constvoid*>(name)));}for(auto name : output_names_){ allocator_.Free(const_cast<void*>(static_cast<constvoid*>(name)));}}boolONNXClassifier::initialize(){try{// 创建会话 session_ = std::make_unique<Ort::Session>(env_, model_path_.c_str(), session_options_);// 获取输入输出信息 size_t num_input_nodes = session_->GetInputCount();for(size_t i =0; i < num_input_nodes; i++){ input_names_.push_back(session_->GetInputNameAllocated(i, allocator_).release()); Ort::TypeInfo type_info = session_->GetInputTypeInfo(i);auto tensor_info = type_info.GetTensorTypeAndShapeInfo(); input_shape_ = tensor_info.GetShape();} size_t num_output_nodes = session_->GetOutputCount();for(size_t i =0; i < num_output_nodes; i++){ output_names_.push_back(session_->GetOutputNameAllocated(i, allocator_).release());}returntrue;}catch(const Ort::Exception& e){ std::cerr <<"ONNX初始化失败: "<< e.what()<< std::endl;returnfalse;}} std::vector<float>ONNXClassifier::predict(const cv::Mat& image){if(!session_){throw std::runtime_error("Session not initialized. Call initialize() first.");}// 1. BGR -> RGB 转换 cv::Mat img_rgb; cv::cvtColor(image, img_rgb, cv::COLOR_BGR2RGB);// 2. 调整图像尺寸到 224x224 cv::Mat img_resized; cv::resize(img_rgb, img_resized, cv::Size(224,224));// 3. 转换为float32并归一化到 [0, 1] cv::Mat float_img; img_resized.convertTo(float_img, CV_32FC3,1.0/255.0);// 4. ImageNet标准化 - 关键步骤!// ImageNet mean and std valuesconstfloat mean[3]={0.485f,0.456f,0.406f};constfloat std[3]={0.229f,0.224f,0.225f};// 创建标准化后的图像 cv::Mat standardized_img = cv::Mat::zeros(float_img.size(), float_img.type());for(int y =0; y < float_img.rows; y++){for(int x =0; x < float_img.cols; x++){ cv::Vec3f pixel = float_img.at<cv::Vec3f>(y, x); cv::Vec3f standardized_pixel;for(int c =0; c <3; c++){ standardized_pixel[c]=(pixel[c]- mean[c])/ std[c];} standardized_img.at<cv::Vec3f>(y, x)= standardized_pixel;}}// 5. 转换为CHW格式 (1, 3, 224, 224) std::vector<cv::Mat> channels; cv::split(standardized_img, channels); std::vector<float> input_data; input_data.reserve(1*3*224*224);// 按通道顺序添加数据for(constauto& channel : channels){ input_data.insert(input_data.end(),(float*)channel.datastart,(float*)channel.dataend);}// 准备输入 Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu( OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault); Ort::Value input_tensor = Ort::Value::CreateTensor<float>( memory_info, input_data.data(), input_data.size(), input_shape_.data(), input_shape_.size());// 运行推理auto output_tensors = session_->Run( Ort::RunOptions{nullptr}, input_names_.data(),&input_tensor, input_names_.size(), output_names_.data(), output_names_.size());// 获取输出float* output_data = output_tensors[0].GetTensorMutableData<float>(); Ort::TensorTypeAndShapeInfo shape_info = output_tensors[0].GetTensorTypeAndShapeInfo(); std::vector<int64_t> output_shape = shape_info.GetShape();// 计算总元素数量 size_t total_elements =1;for(auto dim : output_shape){ total_elements *= dim;}return std::vector<float>(output_data, output_data + total_elements);} std::vector<float>ONNXClassifier::softmax(const std::vector<float>& x){ std::vector<float>exp_x(x.size());float max_val =*std::max_element(x.begin(), x.end());float sum =0.0f;for(size_t i =0; i < x.size(); i++){ exp_x[i]= std::exp(x[i]- max_val); sum += exp_x[i];}for(size_t i =0; i < exp_x.size(); i++){ exp_x[i]/= sum;}return exp_x;} ONNXClassifier::PredictionResult ONNXClassifier::predictWithDetails(const cv::Mat& image){ PredictionResult result;// 进行预测 result.logits =predict(image); result.probabilities =softmax(result.logits);// 找出预测类别 result.class_index = std::distance(result.probabilities.begin(), std::max_element(result.probabilities.begin(), result.probabilities.end())); result.confidence = result.probabilities[result.class_index];// 设置类别名称if(result.class_index < class_names_.size()){ result.class_name = class_names_[result.class_index];}else{ result.class_name ="未知类别";}return result;}voidONNXClassifier::printPredictionDetails(const PredictionResult& result)const{ std::cout <<"预测类别索引: "<< result.class_index << std::endl; std::cout <<"置信度: "<< result.confidence << std::endl; std::cout <<"预测结果: "<< result.class_name <<" (置信度: "<< result.confidence <<")"<< std::endl;// 显示所有类别的详细概率 size_t display_classes = std::min(class_names_.size(), result.probabilities.size());for(size_t i =0; i < display_classes; i++){ std::cout << class_names_[i]<<": logit="<< result.logits[i]<<", probability="<< result.probabilities[i]<< std::endl;}}

以下是Widget类的具体实现方法。

#ifndefWIDGET_H#defineWIDGET_H#include<QWidget>#include<QTimer>#include<QKeyEvent>#include<opencv2/opencv.hpp>#include"onnxclassifier.h"usingnamespace cv; QT_BEGIN_NAMESPACE namespace Ui {classWidget;} QT_END_NAMESPACE classWidget:publicQWidget{ Q_OBJECT public:Widget(QWidget *parent =nullptr);~Widget();private: Ui::Widget *ui; QTimer timer; VideoCapture cap;voidinitParamter(); QImage cvMat2QImage(const cv::Mat &image); cv::Mat ImageConvert(cv::Mat& mat);voiddrawSquareOnImage(QRect square, QColor color);intjudgeImageSimilarity(cv::Mat &src, cv::Mat &dst,int area);voidkeyPressEvent(QKeyEvent *event); ONNXClassifier* classifier; Mat frame; QImage img; QRect qroiRect_1, qroiRect_2, qroiRect_3, qroiRect_4; cv::Rect cvroiRect_1, cvroiRect_2, cvroiRect_3, cvroiRect_4; Mat roi_1, roi_2, roi_3, roi_4; Mat roiWantToJudge_1, roiWantToJudge_2, roiWantToJudge_3, roiWantToJudge_4; signals:voidsendisob(int result1,int result2,int result3,int result4);public slots:voidupdatePic();voidsetIsobsValue(int result_1,int result_2,int result_3,int result_4);};#endif// WIDGET_H
#include"widget.h"#include"ui_widget.h"#include<opencv2/opencv.hpp>#include<iostream>#include<fstream>#include<QDebug>#include<QPainter>#include<vector>#include<QMessageBox>usingnamespace cv;usingnamespace std;staticunsignedint i =0;doublecompareSSIM(const Mat &img1,const Mat &img2, Mat &ssimMap); Scalar getMSSIM(const Mat& i1,const Mat& i2);Widget::Widget(QWidget *parent):QWidget(parent),ui(new Ui::Widget){ ui->setupUi(this); cap.open(0);if(!cap.isOpened()){ std::cerr <<"无法打开摄像头!"<< std::endl;return;}initParamter(); timer.start(33);connect(&timer,&QTimer::timeout,this,&Widget::updatePic);connect(this,&Widget::sendisob,this,&Widget::setIsobsValue);}Widget::~Widget(){delete ui;}voidWidget::initParamter(){ std::string onnx_path ="best.onnx";// 创建分类器(可以传入自定义类别名称) std::vector<std::string> class_names ={"遮擋","區域1","區域2","區域3","區域4"};// 初始化ONNX分类器 classifier =newONNXClassifier(onnx_path, class_names);// 初始化if(!classifier->initialize()){ std::cerr <<"分类器初始化失败"<< std::endl;}//create 4 light area coordinarys qroiRect_1 =QRect(200,800,224,224); qroiRect_2 =QRect(200,400,224,224); qroiRect_3 =QRect(600,200,224,224); qroiRect_4 =QRect(800,100,224,224); cvroiRect_1 = cv::Rect(qroiRect_1.x(), qroiRect_1.y(), qroiRect_1.width(), qroiRect_1.height()); cvroiRect_2 = cv::Rect(qroiRect_2.x(), qroiRect_2.y(), qroiRect_2.width(), qroiRect_2.height()); cvroiRect_3 = cv::Rect(qroiRect_3.x(), qroiRect_3.y(), qroiRect_3.width(), qroiRect_3.height()); cvroiRect_4 = cv::Rect(qroiRect_4.x(), qroiRect_4.y(), qroiRect_4.width(), qroiRect_4.height());} QImage Widget::cvMat2QImage(const cv::Mat &image){if(image.type()== CV_8UC3){returnQImage(image.data, image.cols, image.rows, image.step, QImage::Format_RGB888).rgbSwapped();}elseif(image.type()== CV_8UC1){returnQImage(image.data, image.cols, image.rows, image.step, QImage::Format_Grayscale8);}elseif(image.type()== CV_16U){ cv::Mat scaledImage; cv::normalize(image, scaledImage,0,255, cv::NORM_MINMAX); scaledImage.convertTo(scaledImage, CV_8U);// 强制转换为 8 位图像// // 打印图像信息,调试用// qDebug() << "Scaled image - rows:" << scaledImage.rows << "cols:" << scaledImage.cols// << "depth:" << scaledImage.depth() << "channels:" << scaledImage.channels();// 创建 QImage 对象并复制数据 QImage img(scaledImage.data, scaledImage.cols, scaledImage.rows, scaledImage.step, QImage::Format_Grayscale8);return img.copy();// // 返回 QImage 对象的副本,确保数据在返回时被正确处理}returnQImage();// 如果图像类型不支持,返回一个空的 QImage} cv::Mat Widget::ImageConvert(cv::Mat& mat){ cv::Mat gray;// 将 8 位 RGB 图像转换为 8 位灰度图像 cv::cvtColor(mat, gray, cv::COLOR_RGB2GRAY); cv::Mat gray12;// 将 8 位灰度图像转换为 12 位灰度图像 (0 ~ 4095) gray.convertTo(gray12, CV_16U,16.0);// 按比例将灰度值从 [0, 255] 扩展到 [0, 4095] cv::Mat resized;// 将 12 位灰度图像调整为 1280x1024 cv::resize(gray12, resized, cv::Size(1280,1024));return resized;}voidWidget::updatePic(){ cap.read(frame); Mat convertedframe;if(!frame.empty()){ convertedframe =ImageConvert(frame);if(i==10){ roi_1 =convertedframe(cvroiRect_1); roi_2 =convertedframe(cvroiRect_2); roi_3 =convertedframe(cvroiRect_3); roi_4 =convertedframe(cvroiRect_4);}if(i>10){ roiWantToJudge_1 =convertedframe(cvroiRect_1); roiWantToJudge_2 =convertedframe(cvroiRect_2); roiWantToJudge_3 =convertedframe(cvroiRect_3); roiWantToJudge_4 =convertedframe(cvroiRect_4);/******************遮挡判断***********************/ cv::Mat frame_resized, resized_1, resized_2, resized_3, resized_4; cv::resize(frame, frame_resized, cv::Size(1280,1024)); resized_1 =frame_resized(cvroiRect_1); resized_2 =frame_resized(cvroiRect_2); resized_3 =frame_resized(cvroiRect_3); resized_4 =frame_resized(cvroiRect_4);/* save train pictures */ std::string filename_1 ="1/roi1_image_"+ std::to_string(i)+".jpg"; std::string filename_2 ="2/roi2_image_"+ std::to_string(i)+".jpg"; std::string filename_3 ="3/roi3_image_"+ std::to_string(i)+".jpg"; std::string filename_4 ="4/roi4_image_"+ std::to_string(i)+".jpg";bool saveSuccess = cv::imwrite(filename_1, resized_1);bool saveSuccess_2 = cv::imwrite(filename_2, resized_2);bool saveSuccess_3 = cv::imwrite(filename_3, resized_3);bool saveSuccess_4 = cv::imwrite(filename_4, resized_4);bool result_1 =false, result_2 =false, result_3 =false, result_4 =false;// 进行预测并获取详细结果 ONNXClassifier::PredictionResult result_11 = classifier->predictWithDetails(resized_1); classifier->printPredictionDetails(result_11);if(result_11.class_index!=1){ result_1 =true;} ONNXClassifier::PredictionResult result_22 = classifier->predictWithDetails(resized_2); classifier->printPredictionDetails(result_22);qDebug()<<"result_22.class_index: "<<result_22.class_index;if(result_22.class_index!=2){ result_2 =true;} ONNXClassifier::PredictionResult result_33 = classifier->predictWithDetails(resized_3); classifier->printPredictionDetails(result_33);qDebug()<<"result_33.class_index: "<<result_33.class_index;if(result_33.class_index!=3){ result_3 =true;} ONNXClassifier::PredictionResult result_44 = classifier->predictWithDetails(resized_4); classifier->printPredictionDetails(result_44);qDebug()<<"result_44.class_index: "<<result_44.class_index;if(result_44.class_index!=4){ result_4 =true;}// 打印预测详情// classifier->printPredictionDetails(result); emit sendisob(result_1, result_2, result_3, result_4);}/******************绘图***********************/ QImage img =cvMat2QImage(convertedframe); ui->label->setPixmap(QPixmap::fromImage(img)); QImage img_roi = img.copy(qroiRect_1); ui->label_2->setPixmap(QPixmap::fromImage(img_roi)); img_roi = img.copy(qroiRect_2); ui->label_7->setPixmap(QPixmap::fromImage(img_roi)); img_roi = img.copy(qroiRect_3); ui->label_11->setPixmap(QPixmap::fromImage(img_roi)); img_roi = img.copy(qroiRect_4); ui->label_15->setPixmap(QPixmap::fromImage(img_roi));drawSquareOnImage(qroiRect_1, Qt::red);drawSquareOnImage(qroiRect_2, Qt::red);drawSquareOnImage(qroiRect_3, Qt::red);drawSquareOnImage(qroiRect_4, Qt::red); i++;}}voidWidget::setIsobsValue(int result_1,int result_2,int result_3,int result_4){if(result_1 ==0){ ui->label_5->setStyleSheet("background-color: green; color: white;"); ui->label_5->setText(QString::fromLocal8Bit("否"));}else{ ui->label_5->setStyleSheet("background-color: red; color: white;"); ui->label_5->setText(QString::fromLocal8Bit("是"));}if(result_2 ==0){ ui->label_9->setStyleSheet("background-color: green; color: white;"); ui->label_9->setText(QString::fromLocal8Bit("否"));}else{ ui->label_9->setStyleSheet("background-color: red; color: white;"); ui->label_9->setText(QString::fromLocal8Bit("是"));}if(result_3 ==0){ ui->label_13->setStyleSheet("background-color: green; color: white;"); ui->label_13->setText(QString::fromLocal8Bit("否"));}else{ ui->label_13->setStyleSheet("background-color: red; color: white;"); ui->label_13->setText(QString::fromLocal8Bit("是"));}if(result_4 ==0){ ui->label_17->setStyleSheet("background-color: green; color: white;"); ui->label_17->setText(QString::fromLocal8Bit("否"));}else{ ui->label_17->setStyleSheet("background-color: red; color: white;"); ui->label_17->setText(QString::fromLocal8Bit("是"));}}voidWidget::keyPressEvent(QKeyEvent *event){if(event->key()== Qt::Key_Escape){QApplication::quit();}else{QWidget::keyPressEvent(event);}}voidWidget::drawSquareOnImage(QRect square, QColor color){if(ui->label->pixmap()==nullptr)return;// 确保 label 上有图像 QPixmap pixmap =*ui->label->pixmap();// 复制当前 Pixmap QPainter painter(&pixmap); painter.setPen(QPen(color,2));// 设置红色画笔,线宽 2 painter.drawRect(square);// 在图像上绘制正方形 painter.end(); ui->label->setPixmap(pixmap);// 更新 UI}

4 资源下载

本案例中涉及到的所有代码、模型文件请到此处下载https://download.ZEEKLOG.net/download/wang_chao118/91911039?spm=1001.2014.3001.5501

Read more

OpenClaw 最新功能大揭秘!2026年最火开源AI Agent迎来史诗级升级,手机变身AI终端不是梦

OpenClaw 最新功能大揭秘!2026年最火开源AI Agent迎来史诗级升级,手机变身AI终端不是梦 大家好,我是Maynor。最近开源社区彻底炸锅了——OpenClaw(前身Clawdbot/Moltbot)又一次刷屏!这个能真正“干活”的本地AI助手,在3月2日刚刚发布v2026.3.1版本,紧接着2月底的v2026.2.26也是里程碑式更新。 从外部密钥管理、线程绑定Agent,到Android深度集成、WebSocket优先传输……OpenClaw正在把“AI常驻员工”从概念变成现实。 今天这篇图文并茂的干货,带你一口气看懂最新功能、安装上手和实战价值!

By Ne0inhk
【Windows安装openclaw,配置qwen模型和ollama本地模型,飞书群组添加机器人】

【Windows安装openclaw,配置qwen模型和ollama本地模型,飞书群组添加机器人】

Windows11安装OpenClaw,配置千问Qwen模型及配置服务器本地模型Ollama,接入飞书机器人 * 第一步、安装Nodejs * 第二步、安装Git * 第三步、安装Openclaw * 配置本地大模型 * 第四步、配置飞书 第一步、安装Nodejs 1、减少后续各种报错情况,先安装Nodejs,下载地址:https://nodejs.org/zh-cn/download,选择对应操作系统,24版本太新,有些依赖不适配,本文选择22.22.0版本,node-v22.22.0-x64.msi 直接双击安装即可。 2、安装完成看一下版本信息,用管理员权限打开win的PowerShell 3、执行 node -v 第二步、安装Git 1、安装Git 访问地址 https://git-scm.com/install/

By Ne0inhk

【论文阅读笔记】GlobeDiff:用扩散模型从局部观测生成全局状态,破解多智能体部分可观测难题

ICLR 2026 poster GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent Systemopenreview: https://openreview.net/forum?id=96g2BRsYZXarXiv: https://arxiv.org/abs/2602.15776 在多智能体强化学习(MARL)中,部分可观性(Partial Observability, PO) 是一个长期存在的难题。每个智能体只能看到局部信息,却需要基于此做出全局协调的决策。现有的方法(如信念状态估计或通信)往往难以准确还原全局状态,容易出现“模式坍塌”(Mode Collapse),即把多种可能的全局状态平均成一个模糊的状态,导致决策失误。 本文介绍了 GlobeDiff,一种基于条件扩散模型(Conditional Diffusion Model)

By Ne0inhk

YOLOv8n机器人场景目标检测实战|第一周工作笔记1

核心完成项:基于Conda搭建Ultralytics8.0+PyTorch2.1专属环境,完成COCO2017机器人场景子集筛选(8000张,7000训+1000验),跑通YOLOv8n基础训练(epoch=50),小障碍物mAP≥65%,模型可正常输出推理结果,满足周验收全部目标。 环境说明:全程使用Conda进行包管理与环境隔离,无pip命令使用,规避版本兼容问题;模型选用YOLOv8n(轻量化版本,适配机器人端算力限制),替代原计划YOLOv9n,核心实操逻辑一致。 一、本周核心目标与执行思路 1. 核心目标 1. 掌握YOLO系列核心创新与轻量化模型适配逻辑,聚焦机器人室内小场景(室内小障碍物/桌椅/行人/台阶)检测需求; 2. 搭建稳定可复现的Ultralytics+PyTorch训练环境,规避版本冲突; 3. 筛选并整理符合YOLO格式的机器人场景自定义数据集,完成基础标注与训练集/验证集划分; 4. 跑通YOLOv8n基础训练流程,验证数据集与模型兼容性,获取基础精度、参数量、

By Ne0inhk