人工智能：计算机视觉的基础与应用

优质文章学习记录

08 Apr 2026 — 16 min read

第十二篇：计算机视觉的基础与应用

学习目标

💡 理解计算机视觉的基本概念和重要性
💡 掌握计算机视觉中的图像处理技术、特征提取方法、常用模型与架构
💡 学会使用计算机视觉库（OpenCV、PIL、PyTorch、TensorFlow）进行图像处理、特征提取和模型训练
💡 理解图像分类、目标检测、语义分割等任务的实现方法
💡 通过实战项目，开发一个完整的计算机视觉应用

重点内容

计算机视觉的基本概念
图像处理技术（图像预处理、增强、滤波）
特征提取方法（HOG、SIFT、ORB）
常用模型与架构（LeNet、AlexNet、VGG、ResNet、YOLO）
实战项目：计算机视觉应用开发（图像分类、目标检测等）

一、计算机视觉基础

1.1 计算机视觉的基本概念

计算机视觉（Computer Vision）是人工智能的一个重要分支，它涉及计算机与图像之间的交互。其目标是让计算机能够理解和解释图像内容，从而实现与人类视觉系统类似的功能。

1.1.1 计算机视觉的重要性

计算机视觉具有以下重要性：

图像理解：理解图像内容，识别物体、场景和动作
目标检测：检测图像中的物体并定位其位置
图像分类：对图像进行分类和标签化
语义分割：对图像进行像素级的分割和标记
图像生成：生成新的图像内容

1.1.2 计算机视觉的应用场景

计算机视觉在各个领域都有广泛的应用，主要包括：

医疗领域：用于疾病诊断、医学影像分析
汽车领域：用于自动驾驶、智能交通系统
安防领域：用于视频监控、人脸识别
电商领域：用于产品推荐、图像搜索
社交媒体：用于图像分类、内容推荐

1.2 计算机视觉的挑战

计算机视觉面临以下挑战：

图像质量：图像可能存在噪声、模糊等问题
物体多样性：物体可能有不同的大小、形状、颜色和姿态
场景复杂性：场景可能有不同的光照、背景和遮挡
数据稀疏性：某些领域的数据非常稀缺
计算资源：图像处理需要大量的计算资源

二、图像处理技术

2.1 图像预处理

图像预处理是计算机视觉的基础步骤，它包括以下操作：

2.1.1 图像读取与保存

图像读取与保存是图像处理的基本操作。常见的图像格式包括：

JPEG：有损压缩格式，适用于照片
PNG：无损压缩格式，适用于图标和图表
BMP：无损压缩格式，适用于位图

2.1.2 图像调整

图像调整包括以下操作：

调整尺寸：改变图像的大小
调整亮度和对比度：改变图像的亮度和对比度
调整色彩平衡：改变图像的色彩平衡

2.1.3 图像裁剪与旋转

图像裁剪与旋转包括以下操作：

裁剪：从图像中裁剪出感兴趣的区域
旋转：旋转图像到指定角度

2.1.4 图像预处理的代码实现

以下是使用OpenCV进行图像预处理的代码实现：

import cv2 import numpy as np defread_image(image_path): image = cv2.imread(image_path)return image defsave_image(image, output_path): cv2.imwrite(output_path, image)defresize_image(image, width, height): resized_image = cv2.resize(image,(width, height))return resized_image defadjust_brightness_contrast(image, alpha=1.0, beta=0.0): adjusted_image = cv2.convertScaleAbs(image, alpha=alpha, beta=beta)return adjusted_image defcrop_image(image, x, y, width, height): cropped_image = image[y:y+height, x:x+width]return cropped_image defrotate_image(image, angle):(h, w)= image.shape[:2] center =(w //2, h //2) M = cv2.getRotationMatrix2D(center, angle,1.0) rotated_image = cv2.warpAffine(image, M,(w, h))return rotated_image

2.2 图像增强

图像增强是提高图像质量的过程。常见的图像增强方法包括：

2.2.1 直方图均衡化

直方图均衡化是通过调整图像的直方图来增强图像对比度的方法。

2.2.2 图像平滑

图像平滑是通过去除图像中的噪声来增强图像质量的方法。常见的图像平滑方法包括：

均值滤波：使用邻域像素的平均值代替当前像素值
高斯滤波：使用高斯函数计算邻域像素的权重
中值滤波：使用邻域像素的中值代替当前像素值

2.2.3 图像增强的代码实现

以下是使用OpenCV进行图像增强的代码实现：

import cv2 import numpy as np defhistogram_equalization(image): gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) equalized_image = cv2.equalizeHist(gray_image)return equalized_image defmean_filter(image, kernel_size=3): blurred_image = cv2.blur(image,(kernel_size, kernel_size))return blurred_image defgaussian_filter(image, kernel_size=3, sigma=0): blurred_image = cv2.GaussianBlur(image,(kernel_size, kernel_size), sigma)return blurred_image defmedian_filter(image, kernel_size=3): blurred_image = cv2.medianBlur(image, kernel_size)return blurred_image

2.3 图像滤波

图像滤波是对图像进行滤波处理的过程。常见的图像滤波方法包括：

2.3.1 边缘检测

边缘检测是检测图像中边缘的方法。常见的边缘检测方法包括：

Sobel算子：使用Sobel算子检测边缘
Canny边缘检测：使用Canny算法检测边缘

2.3.2 图像滤波的代码实现

以下是使用OpenCV进行图像滤波的代码实现：

import cv2 import numpy as np defsobel_edge_detection(image): gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) sobel_x = cv2.Sobel(gray_image, cv2.CV_64F,1,0, ksize=3) sobel_y = cv2.Sobel(gray_image, cv2.CV_64F,0,1, ksize=3) sobel_combined = np.sqrt(sobel_x**2+ sobel_y**2) sobel_combined = np.uint8(sobel_combined / np.max(sobel_combined)*255)return sobel_combined defcanny_edge_detection(image, threshold1=100, threshold2=200): gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) edges = cv2.Canny(gray_image, threshold1, threshold2)return edges

三、特征提取方法

3.1 HOG特征

3.1.1 HOG特征的基本原理

HOG（Histogram of Oriented Gradients）是一种常用的图像特征提取方法。它通过计算图像中梯度的方向直方图来提取特征。

3.1.2 HOG特征的代码实现

以下是使用OpenCV进行HOG特征提取的代码实现：

import cv2 import numpy as np defextract_hog_features(image): hog = cv2.HOGDescriptor() gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) features = hog.compute(gray_image)return features

3.2 SIFT特征

3.2.1 SIFT特征的基本原理

SIFT（Scale-Invariant Feature Transform）是一种常用的图像特征提取方法。它通过在不同尺度空间中检测关键点来提取特征。

3.2.2 SIFT特征的代码实现

以下是使用OpenCV进行SIFT特征提取的代码实现：

import cv2 import numpy as np defextract_sift_features(image): sift = cv2.SIFT_create() gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) keypoints, descriptors = sift.detectAndCompute(gray_image,None)return keypoints, descriptors

3.3 ORB特征

3.3.1 ORB特征的基本原理

ORB（Oriented FAST and Rotated BRIEF）是一种常用的图像特征提取方法。它通过FAST算法检测关键点，使用BRIEF算法计算描述符。

3.3.2 ORB特征的代码实现

以下是使用OpenCV进行ORB特征提取的代码实现：

import cv2 import numpy as np defextract_orb_features(image): orb = cv2.ORB_create() gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) keypoints, descriptors = orb.detectAndCompute(gray_image,None)return keypoints, descriptors

四、常用模型与架构

4.1 传统机器学习模型

4.1.1 支持向量机

支持向量机是一种常用的图像分类模型。它通过寻找最优超平面来分离不同类别的样本。

4.1.2 决策树

决策树是一种常用的图像分类模型。它通过构建决策树来对图像进行分类。

4.1.3 随机森林

随机森林是一种集成学习模型。它通过组合多个决策树来提高模型的性能。

4.2 深度学习模型

4.2.1 LeNet

LeNet是一种早期的深度学习模型。它通过卷积和池化操作来提取图像特征。

4.2.2 AlexNet

AlexNet是一种经典的深度学习模型。它通过更深的网络结构和ReLU激活函数来提高模型的性能。

4.2.3 VGG

VGG是一种深度学习模型。它通过使用小卷积核和更深的网络结构来提高模型的性能。

4.2.4 ResNet

ResNet是一种深度学习模型。它通过引入残差连接来解决深度网络中的梯度消失问题。

4.2.5 YOLO

YOLO（You Only Look Once）是一种目标检测模型。它通过单次前向传播来检测图像中的物体。

4.3 模型训练的代码实现

4.3.1 使用PyTorch进行ResNet模型训练

以下是使用PyTorch进行ResNet模型训练的代码实现：

import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader from torchvision import datasets, transforms, models deftrain_resnet_model(data_dir, num_classes=2, batch_size=32, num_epochs=10, lr=0.001):# 数据预处理 data_transforms ={'train': transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])]),'val': transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])]),}# 加载数据 image_datasets ={x: datasets.ImageFolder(f'{data_dir}/{x}', data_transforms[x])for x in['train','val']} dataloaders ={x: DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4)for x in['train','val']} dataset_sizes ={x:len(image_datasets[x])for x in['train','val']} class_names = image_datasets['train'].classes # 加载模型 model = models.resnet18(pretrained=True) num_ftrs = model.fc.in_features model.fc = nn.Linear(num_ftrs, num_classes)# 定义损失函数和优化器 criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9) scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)# 训练模型for epoch inrange(num_epochs):print(f'Epoch {epoch}/{num_epochs -1}')print('-'*10)for phase in['train','val']:if phase =='train': model.train()else: model.eval() running_loss =0.0 running_corrects =0for inputs, labels in dataloaders[phase]: optimizer.zero_grad()with torch.set_grad_enabled(phase =='train'): outputs = model(inputs) _, preds = torch.max(outputs,1) loss = criterion(outputs, labels)if phase =='train': loss.backward() optimizer.step() running_loss += loss.item()* inputs.size(0) running_corrects += torch.sum(preds == labels.data)if phase =='train': scheduler.step() epoch_loss = running_loss / dataset_sizes[phase] epoch_acc = running_corrects.double()/ dataset_sizes[phase]print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')print('Training complete')return model

五、实战项目：计算机视觉应用开发

5.1 项目需求分析

5.1.1 应用目标

构建一个计算机视觉应用，能够进行图像分类、目标检测、语义分割等任务。

5.1.2 用户需求

支持图像输入和处理
支持图像分类、目标检测、语义分割等任务
提供友好的用户界面，使用简单方便

5.1.3 功能范围

图像输入和处理
图像分类
目标检测
语义分割
结果可视化

5.2 系统架构设计

5.2.1 应用架构

该计算机视觉应用的架构采用分层设计，分为以下几个层次：

用户界面层：提供用户与系统的交互接口，包括图像输入、图像处理、结果可视化等功能
应用逻辑层：处理用户请求、业务逻辑和应用控制
图像处理层：对图像进行处理和分析
数据存储层：存储图像数据和处理结果

5.2.2 数据存储方案

该系统的数据存储方案包括以下几个部分：

图像数据存储：使用文件系统存储图像数据
处理结果存储：使用文件系统存储处理结果

5.3 系统实现

5.3.1 开发环境搭建

首先，需要搭建开发环境。该系统使用 Python 作为开发语言，使用 OpenCV、PIL、PyTorch 和 TensorFlow 等库作为计算机视觉工具，使用 Tkinter 作为图形用户界面。

# 安装 OpenCV 库 pip install opencv-python # 安装 PIL 库 pip install pillow # 安装 PyTorch 库 pip install torch torchvision # 安装 TensorFlow 库 pip install tensorflow

5.3.2 图像输入和处理

图像输入和处理是系统的基础功能。以下是图像输入和处理的实现代码：

import tkinter as tk from tkinter import filedialog from PIL import Image, ImageTk classImageInputFrame(tk.Frame):def__init__(self, parent, on_image_selected): tk.Frame.__init__(self, parent) self.parent = parent self.on_image_selected = on_image_selected # 创建组件 self.create_widgets()defcreate_widgets(self):# 图像显示区域 self.image_label = tk.Label(self) self.image_label.pack(pady=10, padx=10, fill="both", expand=True)# 选择图像按钮 tk.Button(self, text="选择图像", command=self.select_image).pack(pady=10, padx=10)defselect_image(self):# 选择图像文件 file_path = filedialog.askopenfilename(filetypes=[("Image Files","*.png *.jpg *.jpeg *.bmp")])if file_path:# 打开图像 image = Image.open(file_path)# 调整图像大小 image = image.resize((400,300), Image.ANTIALIAS)# 显示图像 photo = ImageTk.PhotoImage(image) self.image_label.configure(image=photo) self.image_label.image = photo # 调用回调函数 self.on_image_selected(file_path)

5.3.3 图像分类

图像分类是系统的核心功能之一。以下是图像分类的实现代码：

import torch from torchvision import transforms, models from PIL import Image defclassify_image(image_path, model_path, class_names):# 数据预处理 data_transforms = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])])# 加载图像 image = Image.open(image_path) image = data_transforms(image) image = image.unsqueeze(0)# 加载模型 model = models.resnet18() num_ftrs = model.fc.in_features model.fc = torch.nn.Linear(num_ftrs,len(class_names)) model.load_state_dict(torch.load(model_path)) model.eval()# 分类图像with torch.no_grad(): outputs = model(image) _, preds = torch.max(outputs,1)return class_names[preds[0]]

5.3.4 目标检测

目标检测是系统的核心功能之一。以下是目标检测的实现代码：

import cv2 import numpy as np import torch from torchvision import transforms, models from PIL import Image defdetect_objects(image_path, model_path, class_names):# 加载图像 image = cv2.imread(image_path) image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image_pil = Image.fromarray(image_rgb)# 数据预处理 data_transforms = transforms.Compose([ transforms.Resize((416,416)), transforms.ToTensor(), transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])]) image_tensor = data_transforms(image_pil) image_tensor = image_tensor.unsqueeze(0)# 加载模型 model = models.detection.fasterrcnn_resnet50_fpn(pretrained=False) in_features = model.roi_heads.box_predictor.cls_score.in_features model.roi_heads.box_predictor = models.detection.faster_rcnn.FastRCNNPredictor(in_features,len(class_names)) model.load_state_dict(torch.load(model_path)) model.eval()# 检测物体with torch.no_grad(): outputs = model(image_tensor)# 绘制检测结果 boxes = outputs[0]['boxes'].cpu().numpy() scores = outputs[0]['scores'].cpu().numpy() labels = outputs[0]['labels'].cpu().numpy()for i inrange(len(boxes)):if scores[i]>0.5: box = boxes[i].astype(int) label = class_names[labels[i]] score = scores[i] cv2.rectangle(image,(box[0], box[1]),(box[2], box[3]),(0,255,0),2) cv2.putText(image,f"{label}: {score:.2f}",(box[0], box[1]-10), cv2.FONT_HERSHEY_SIMPLEX,0.9,(0,255,0),2)return image

5.3.5 结果可视化

结果可视化是系统的重要功能之一。以下是结果可视化的实现代码：

import tkinter as tk from tkinter import scrolledtext from PIL import Image, ImageTk classResultFrame(tk.Frame):def__init__(self, parent): tk.Frame.__init__(self, parent) self.parent = parent # 创建组件 self.create_widgets()defcreate_widgets(self):# 结果显示区域 self.result_text = scrolledtext.ScrolledText(self, width=60, height=5) self.result_text.pack(pady=10, padx=10, fill="both", expand=True)defdisplay_result(self, result):# 清空结果 self.result_text.delete("1.0", tk.END)# 显示结果 self.result_text.insert(tk.END, result)

5.3.6 用户界面

用户界面是系统的交互部分。以下是用户界面的实现代码：

import tkinter as tk from tkinter import ttk, messagebox, filedialog from PIL import Image, ImageTk from image_input_frame import ImageInputFrame from result_frame import ResultFrame from cv_functions import classify_image, detect_objects classCVApp:def__init__(self, root): self.root = root self.root.title("计算机视觉应用")# 示例类名 self.class_names =['猫','狗']# 模型路径 self.model_path ='model.pth'# 创建组件 self.create_widgets()defcreate_widgets(self):# 图像输入和处理区域 self.image_input_frame = ImageInputFrame(self.root, self.process_image) self.image_input_frame.pack(pady=10, padx=10, fill="both", expand=True)# 功能选择区域 function_frame = tk.LabelFrame(self.root, text="功能选择") function_frame.pack(pady=10, padx=10, fill="x") self.function_var = tk.StringVar() self.function_var.set("图像分类") tk.Radiobutton(function_frame, text="图像分类", variable=self.function_var, value="图像分类").grid(row=0, column=0, padx=5, pady=5) tk.Radiobutton(function_frame, text="目标检测", variable=self.function_var, value="目标检测").grid(row=0, column=1, padx=5, pady=5)# 结果显示区域 self.result_frame = ResultFrame(self.root) self.result_frame.pack(pady=10, padx=10, fill="both", expand=True)# 输出图像区域 self.output_image_label = tk.Label(self.root) self.output_image_label.pack(pady=10, padx=10, fill="both", expand=True)defprocess_image(self, image_path): function = self.function_var.get()try:if function =="图像分类": result = classify_image(image_path, self.model_path, self.class_names) self.result_frame.display_result(result)elif function =="目标检测": result_image = detect_objects(image_path, self.model_path, self.class_names)# 调整图像大小 result_image = cv2.cvtColor(result_image, cv2.COLOR_BGR2RGB) result_image_pil = Image.fromarray(result_image) result_image_pil = result_image_pil.resize((400,300), Image.ANTIALIAS)# 显示图像 photo = ImageTk.PhotoImage(result_image_pil) self.output_image_label.configure(image=photo) self.output_image_label.image = photo else:raise ValueError("未知功能")except Exception as e: messagebox.showerror("错误",f"处理失败：{str(e)}")if __name__ =="__main__": root = tk.Tk() app = CVApp(root) root.mainloop()

5.4 系统运行与测试

5.4.1 系统运行

运行系统时，需要执行以下步骤：

安装 OpenCV、PIL、PyTorch 和 TensorFlow 库
运行 cv_app.py 文件
选择图像
选择功能（图像分类、目标检测）
查看结果

5.4.2 系统测试

系统测试时，需要使用一些测试图像。以下是一个简单的测试图像示例：

测试图像：一张包含猫和狗的图像
测试操作：
- 选择图像
- 选择功能（图像分类、目标检测）
- 查看结果

六、总结

本章介绍了计算机视觉的基本概念、重要性和应用场景，以及图像处理技术（图像预处理、增强、滤波）的实现方法。同时，本章还介绍了特征提取方法（HOG、SIFT、ORB）和常用模型与架构（LeNet、AlexNet、VGG、ResNet、YOLO）。最后，通过实战项目，展示了如何开发一个完整的计算机视觉应用。

计算机视觉是人工智能的一个重要分支，它涉及计算机与图像之间的交互。其目标是让计算机能够理解和解释图像内容，从而实现与人类视觉系统类似的功能。

通过学习本章的内容，读者可以掌握计算机视觉的基本方法和技巧，具备开发计算机视觉应用的能力。同时，通过实战项目，读者可以将所学知识应用到实际项目中，进一步提升自己的技能水平。