计算机视觉技术与应用实践
本章介绍计算机视觉技术基础与实战,涵盖图像分类、目标检测、图像分割等核心任务。通过卷积神经网络(CNN)、YOLOv5、U-Net 等模型实现具体功能,并结合医疗影像、自动驾驶、人脸识别等场景进行应用分析。内容包含预训练模型迁移学习、生成对抗网络(GAN)及视觉 Transformer(ViT)等前沿技术,最后提供医疗影像分类系统的完整代码示例与工程实践建议。

本章介绍计算机视觉技术基础与实战,涵盖图像分类、目标检测、图像分割等核心任务。通过卷积神经网络(CNN)、YOLOv5、U-Net 等模型实现具体功能,并结合医疗影像、自动驾驶、人脸识别等场景进行应用分析。内容包含预训练模型迁移学习、生成对抗网络(GAN)及视觉 Transformer(ViT)等前沿技术,最后提供医疗影像分类系统的完整代码示例与工程实践建议。

计算机视觉是一门研究如何让计算机'看'的学科,它使计算机能够理解和分析图像、视频等视觉数据。
计算机视觉的主要任务
计算机中的图像通常以数字矩阵的形式表示,每个像素点的亮度或颜色值由数字表示。
图像读取与预处理
import cv2
import numpy as np
from PIL import Image
# 使用 OpenCV 读取图像
image = cv2.imread('image.jpg')
# 使用 PIL 读取图像
image = Image.open('image.jpg')
# 将图像转换为 numpy 数组
image_np = np.array(image)
# 图像预处理
image_resized = cv2.resize(image_np, (224, 224))
image_normalized = image_resized / 255.0
image_reshaped = np.expand_dims(image_normalized, axis=0)
💡 图像预处理是计算机视觉任务的重要步骤,包括调整大小、归一化、增强等操作。
卷积神经网络(CNN)是计算机视觉任务中最常用的深度学习模型,其结构包括卷积层、池化层和全连接层。
CNN 的基本结构
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
✅ 卷积神经网络通过卷积层提取图像的局部特征,池化层减小特征图尺寸,全连接层进行分类。
图像分类是计算机视觉的基础任务,其目标是将图像分为不同的类别。
图像分类的工作流程
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# 加载数据
(x_train, y_train), (x_val, y_val) = tf.keras.datasets.cifar10.load_data()
x_train = x_train.astype('float32') / 255.0
x_val = x_val.astype('float32') / 255.0
# 数据增强
datagen = ImageDataGenerator(
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True
)
train_generator = datagen.flow(x_train, y_train, batch_size=32)
# 构建模型
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# 训练模型
history = model.fit(
train_generator,
validation_data=(x_val, y_val),
epochs=100,
steps_per_epoch=len(x_train)//32,
callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)]
)
# 评估模型
test_loss, test_acc = model.evaluate(x_val, y_val)
print(f"Test accuracy: {test_acc}")
✅ 该模型使用了数据增强、批量归一化、丢弃法等优化策略,提高了图像分类的性能。
预训练模型是在大型数据集上训练好的模型,可以通过迁移学习应用到其他任务中。
使用预训练模型实现图像分类
import tensorflow as tf
from tensorflow.keras.applications import VGG16
# 加载预训练模型
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# 冻结基模型
base_model.trainable = False
# 构建分类模型
model = tf.keras.Sequential([
base_model,
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# 训练模型
history = model.fit(
train_generator,
validation_data=(x_val, y_val),
epochs=10,
steps_per_epoch=len(x_train)//32
)
# 解冻部分层进行微调
base_model.trainable = True
fine_tune_at = 15
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
history_fine = model.fit(
train_generator,
validation_data=(x_val, y_val),
epochs=10,
initial_epoch=history.epoch[-1]
)
💡 预训练模型通过迁移学习可以快速构建高性能的图像分类模型,减少训练时间和数据需求。
目标检测是计算机视觉的重要任务,其目标是检测图像中的目标并确定其位置。
目标检测的常用算法
YOLOv5 是一种快速、准确的目标检测算法,适用于各种应用场景。
使用 YOLOv5 实现目标检测
import torch
from PIL import Image
from pathlib import Path
import cv2
import numpy as np
# 加载 YOLOv5 模型
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
# 检测图像
image_path = 'image.jpg'
results = model(image_path)
# 显示检测结果
results.print()
results.save()
results.show()
# 处理检测结果
detections = results.pandas().xyxy[0]
for index, row in detections.iterrows():
x1, y1, x2, y2 = int(row['xmin']), int(row['ymin']), int(row['xmax']), int(row['ymax'])
label = row['name']
confidence = row['confidence']
# 绘制边界框
image = cv2.imread(image_path)
cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(image, f"{label}{confidence:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
cv2.imwrite('detection_result.jpg', image)
✅ YOLOv5 使用 PyTorch 实现,支持自动下载预训练权重,使用简单方便。
目标检测任务需要标注图像中的目标位置,常用的标注工具包括:
图像分割是将图像分割成不同区域的任务,每个区域代表图像中的一个对象或背景。
图像分割的类型
U-Net 是一种常用的图像分割模型,其结构包括编码器和解码器。
使用 U-Net 实现图像分割
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate
def unet(input_shape=(256, 256, 1)):
inputs = Input(input_shape)
# 编码器
c1 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(inputs)
c1 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c1)
p1 = MaxPooling2D((2, 2))(c1)
c2 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(p1)
c2 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c2)
p2 = MaxPooling2D((2, 2))(c2)
c3 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(p2)
c3 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c3)
p3 = MaxPooling2D((2, 2))(c3)
c4 = Conv2D(512, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(p3)
c4 = Conv2D(512, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c4)
p4 = MaxPooling2D(pool_size=(2, 2))(c4)
c5 = Conv2D(1024, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(p4)
c5 = Conv2D(1024, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c5)
# 解码器
u6 = Conv2DTranspose(512, (2, 2), strides=(2, 2), padding='same')(c5)
u6 = concatenate([u6, c4])
c6 = Conv2D(512, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u6)
c6 = Conv2D(512, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c6)
u7 = Conv2DTranspose(256, (2, 2), strides=(2, 2), padding='same')(c6)
u7 = concatenate([u7, c3])
c7 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u7)
c7 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c7)
u8 = Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(c7)
u8 = concatenate([u8, c2])
c8 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u8)
c8 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c8)
u9 = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(c8)
u9 = concatenate([u9, c1])
c9 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u9)
c9 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c9)
outputs = Conv2D(1, (1, 1), activation='sigmoid')(c9)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
# 编译模型
model = unet(input_shape=(256, 256, 1))
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
# 训练模型
history = model.fit(
x_train, y_train,
validation_data=(x_val, y_val),
epochs=50,
batch_size=16
)
# 评估模型
test_loss, test_acc = model.evaluate(x_val, y_val)
print(f"Test accuracy: {test_acc}")
✅ U-Net 模型通过编码器提取特征,解码器恢复空间信息,实现了图像分割任务。
计算机视觉在医疗领域的应用包括医学图像分类、目标检测、图像分割等,用于辅助医生诊断和治疗。
医疗影像分析示例
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
# 加载预训练模型
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# 构建医疗影像分析模型
model = tf.keras.Sequential([
base_model,
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(2, activation='softmax')
])
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# 训练模型
history = model.fit(
x_train, y_train,
validation_data=(x_val, y_val),
epochs=10
)
# 评估模型
test_loss, test_acc = model.evaluate(x_val, y_val)
print(f"Test accuracy: {test_acc}")
💡 医疗影像分析需要大量的标注数据和专业知识,但可以显著提高医生的诊断效率。
自动驾驶技术使用计算机视觉识别道路上的目标,如车辆、行人、交通标志等。
自动驾驶中的目标检测
import torch
from PIL import Image
from pathlib import Path
# 加载 YOLOv5 模型
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
# 检测交通标志
image_path = 'traffic_sign.jpg'
results = model(image_path)
# 处理检测结果
detections = results.pandas().xyxy[0]
for index, row in detections.iterrows():
x1, y1, x2, y2 = int(row['xmin']), int(row['ymin']), int(row['xmax']), int(row['ymax'])
label = row['name']
confidence = row['confidence']
# 绘制边界框
image = cv2.imread(image_path)
cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(image, f"{label}{confidence:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
cv2.imwrite('traffic_sign_detection.jpg', image)
✅ 自动驾驶技术需要高可靠性和实时性,计算机视觉是其关键组成部分。
人脸检测与识别是计算机视觉的常见应用,用于身份验证、监控等场景。
人脸检测示例
import cv2
# 加载人脸检测器
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
# 读取图像
image = cv2.imread('faces.jpg')
# 转换为灰度图像
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# 检测人脸
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
# 绘制边界框
for (x, y, w, h) in faces:
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
# 保存结果
cv2.imwrite('face_detection.jpg', image)
💡 人脸检测与识别技术已经成熟,但在光照、姿势等条件下的性能仍需优化。
开发一个医疗影像分类系统,用于辅助医生诊断肺部疾病。
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# 数据预处理
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True
)
val_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory('train_dir', target_size=(224, 224), batch_size=32, class_mode='binary')
val_generator = val_datagen.flow_from_directory('val_dir', target_size=(224, 224), batch_size=32, class_mode='binary')
# 加载预训练模型
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# 冻结基模型
base_model.trainable = False
# 构建分类模型
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy']
)
# 训练模型
history = model.fit(
train_generator,
validation_data=val_generator,
epochs=10
)
# 解冻部分层进行微调
base_model.trainable = True
fine_tune_at = 100
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss='binary_crossentropy',
metrics=['accuracy']
)
history_fine = model.fit(
train_generator,
validation_data=val_generator,
epochs=10,
initial_epoch=history.epoch[-1]
)
# 评估模型
test_loss, test_acc = model.evaluate(val_generator)
print(f"Test accuracy: {test_acc}")
# 保存模型
model.save('lung_disease_classifier.h5')
✅ 该项目使用 ResNet50 预训练模型进行迁移学习,实现了肺部疾病的分类任务。
生成对抗网络(GAN)是一种深度学习模型,用于生成新的图像、视频等数据。
使用 GAN 生成图像
import tensorflow as tf
from tensorflow.keras import layers
BATCH_SIZE = 32
noise_dim = 100
def make_generator_model():
model = tf.keras.Sequential()
model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(noise_dim,)))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Reshape((7, 7, 256)))
assert model.output_shape == (None, 7, 7, 256)
model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
assert model.output_shape == (None, 7, 7, 128)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
assert model.output_shape == (None, 14, 14, 64)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
assert model.output_shape == (None, 28, 28, 1)
return model
def make_discriminator_model():
model = tf.keras.Sequential()
model.add(layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same', input_shape=(28, 28, 1)))
model.add(layers.LeakyReLU())
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'))
model.add(layers.LeakyReLU())
model.add(layers.Dropout(0.3))
model.add(layers.Flatten())
model.add(layers.Dense(1))
return model
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
def discriminator_loss(real_output, fake_output):
real_loss = cross_entropy(tf.ones_like(real_output), real_output)
fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
total_loss = real_loss + fake_loss
return total_loss
def generator_loss(fake_output):
return cross_entropy(tf.ones_like(fake_output), fake_output)
generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)
generator = make_generator_model()
discriminator = make_discriminator_model()
@tf.function
def train_step(images):
noise = tf.random.normal([BATCH_SIZE, noise_dim])
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(noise, training=True)
real_output = discriminator(images, training=True)
fake_output = discriminator(generated_images, training=True)
gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output, fake_output)
gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
return gen_loss, disc_loss
💡 GAN 通过生成器和判别器的对抗训练生成逼真的图像,但训练过程不稳定。
视觉 Transformer 是一种基于 Transformer 架构的计算机视觉模型,在图像分类任务中取得了优异的性能。
使用 Vision Transformer 实现图像分类
import tensorflow as tf
from tensorflow.keras import layers
class PatchEncoder(layers.Layer):
def __init__(self, num_patches, projection_dim):
super(PatchEncoder, self).__init__()
self.num_patches = num_patches
self.projection = layers.Dense(units=projection_dim)
self.position_embedding = layers.Embedding(input_dim=num_patches, output_dim=projection_dim)
def call(self, patch):
positions = tf.range(start=0, limit=self.num_patches, delta=1)
encoded = self.projection(patch) + self.position_embedding(positions)
return encoded
def create_vit_model(image_size=224, patch_size=16, num_classes=10, projection_dim=64, num_heads=4, transformer_units=[128, 64], mlp_head_units=[2048, 1024]):
num_patches = (image_size // patch_size)**2
patch_dim = 3 * patch_size ** 2
# 输入层
inputs = layers.Input(shape=(image_size, image_size, 3))
# 将图像分割成补丁
patches = tf.image.extract_patches(images=inputs, sizes=[1, patch_size, patch_size, 1], strides=[1, patch_size, patch_size, 1], rates=[1, 1, 1, 1], padding='VALID')
patches = layers.Reshape((num_patches, patch_dim))(patches)
# 补丁编码
encoded_patches = PatchEncoder(num_patches, projection_dim)(patches)
# 构建 Transformer 编码器
for units in transformer_units:
encoded_patches = layers.LayerNormalization(epsilon=1e-6)(encoded_patches)
attention_output = layers.MultiHeadAttention(num_heads=num_heads, key_dim=projection_dim, dropout=0.1)(encoded_patches, encoded_patches)
x1 = layers.Add()([attention_output, encoded_patches])
x1 = layers.LayerNormalization(epsilon=1e-6)(x1)
x2 = layers.Dense(units, activation='relu')(x1)
x2 = layers.Dense(projection_dim)(x2)
encoded_patches = layers.Add()([x1, x2])
# 全局平均池化
representation = layers.LayerNormalization(epsilon=1e-6)(encoded_patches)
representation = layers.GlobalAveragePooling1D()(representation)
representation = layers.Dropout(0.5)(representation)
# MLP 头
for units in mlp_head_units:
representation = layers.Dense(units, activation='relu')(representation)
representation = layers.Dropout(0.5)(representation)
# 输出层
outputs = layers.Dense(num_classes, activation='softmax')(representation)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
# 构建 Vision Transformer 模型
model = create_vit_model()
# 编译模型
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# 假设已准备好 train_generator 和 val_generator
# 训练模型
# history = model.fit(train_generator, validation_data=val_generator, epochs=10)
✅ 视觉 Transformer 在图像分类任务中取得了优异的性能,但计算成本较高。
在本章中,我们学习了计算机视觉技术与应用实践,包括图像分类、目标检测、图像分割等任务的实现方法,以及计算机视觉技术在医疗影像分析、自动驾驶、人脸检测与识别等场景中的应用。我们还介绍了计算机视觉技术的前沿研究,如生成对抗网络和视觉 Transformer。最后,我们通过实战项目演示了如何开发一个医疗影像分类系统。计算机视觉技术在各个领域的应用越来越广泛,为人类生活带来了很大的便利。

微信公众号「极客日志」,在微信中扫描左侧二维码关注。展示文案:极客日志 zeeklog
使用加密算法(如AES、TripleDES、Rabbit或RC4)加密和解密文本明文。 在线工具,加密/解密文本在线工具,online
生成新的随机RSA私钥和公钥pem证书。 在线工具,RSA密钥对生成器在线工具,online
基于 Mermaid.js 实时预览流程图、时序图等图表,支持源码编辑与即时渲染。 在线工具,Mermaid 预览与可视化编辑在线工具,online
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。 在线工具,curl 转代码在线工具,online
将字符串编码和解码为其 Base64 格式表示形式即可。 在线工具,Base64 字符串编码/解码在线工具,online
将字符串、文件或图像转换为其 Base64 表示形式。 在线工具,Base64 文件转换器在线工具,online