跳到主要内容人脸识别核心算法:FaceNet 与 ArcFace 原理及实战 | 极客日志PythonAI算法
人脸识别核心算法:FaceNet 与 ArcFace 原理及实战
综述由AI生成对比分析了人脸识别领域的两大核心算法 FaceNet 与 ArcFace。FaceNet 采用 Triplet Loss 优化特征空间相对距离,适合大规模类别;ArcFace 引入角度间隔损失,在角度空间强制类内紧凑与类间分离,精度更高。文章包含数学原理推导、PyTorch 完整实现代码(含损失函数、网络结构、训练流程)以及两者在收敛速度、实现复杂度等方面的深度对比,为现代人脸识别系统选型提供参考。
魔法巫师29 浏览 一、引言:人脸识别的本质问题
1.1 人脸识别 ≠ 图像分类
初学者常有的误解:把人脸识别当作分类问题。
❌ 错误思路:分类方法 输入人脸 → CNN → Softmax → 输出"这是第 1532 号人"
问题:
1. 类别数巨大(十亿级身份)
2. 无法处理新注册的人(需要重新训练)
3. 每个人样本极少(很难训练好分类器)
✅ 正确思路:度量学习方法 输入人脸 → CNN → 特征向量 (embedding) → 与数据库比对
优势:
1. 只需学习"什么是相似",不需要预定义类别
2. 新人注册只需提取特征,无需重新训练
3. 一次训练,处理无限身份
1.2 度量学习的核心目标
特征空间的理想状态:
┌────────────────────────────────────────────────────┐
│ ●●● 同一人的特征 │
│ ● A ● 聚集在一起 ▲▲▲ │
│ ●●● ▲ B ▲ │
│ ▲▲▲ │
│ ■■■ 不同人的特征 │
│ ■ C ■ 相互分离 │
│ ■■■ ◆◆◆ │
│ ◆ D ◆ │
│ ◆◆◆ │
└────────────────────────────────────────────────────┘
数学目标:
- 类内距离最小化:d(A₁, A₂) → 0
- 类间距离最大化:d(A, B) → ∞
二、FaceNet:开创性的 Triplet Loss
2.1 FaceNet 概述
FaceNet 是 Google 在 2015 年发表的开创性工作,首次将人脸识别准确率推到 99.63%(LFW 数据集)。
核心贡献:
- 提出直接学习欧氏空间 embedding 的思路
- 设计 Triplet Loss 进行端到端训练
- 证明了 128 维 embedding 足够表示人脸
FaceNet 架构:
Input Image (160×160×3)
│
▼
┌───────────────────┐
│ CNN Backbone │ ← Inception / ResNet
│ (特征提取) │
└───────────────────┘
│
▼
┌───────────────────┐
│ L2 Normalization │ ← 归一化到单位超球面
└───────────────────┘
│
▼
128-dim Embedding f(x) ∈ R^128, ||f(x)||₂ = 1
2.2 Triplet Loss 原理
三元组的构成
每个训练样本是一个"三元组"(Triplet):
┌─────────────────────────────────────────────────────┐
│ Anchor (A) Positive (P) Negative (N)│
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ 😀 │ │ 😄 │ │ 😐 │ │
│ │ Person A │ Person A │ Person B │
│ │ (锚点) │ (正样本) │ (负样本) │
│ │ 同一人的不同照片 │ 与 Anchor 同一人 │ 与 Anchor 不同人 │
└─────────────────────────────────────────────────────┘
损失函数数学定义
Triplet Loss: L = Σ max(0, ||f(A) - f(P)||² - ||f(A) - f(N)||² + α)
其中:
- f(·): CNN 特征提取函数
- ||·||²: 欧氏距离的平方
- α: margin(间隔),通常取 0.2
直观理解:要求 d(A,P) + α < d(A,N)
为什么需要 margin?
没有 margin 的问题:如果只要求 d(A,P) < d(A,N),可能出现差距太小,对噪声不鲁棒。有了 margin 则保证了足够的"安全距离"。
2.3 三元组挖掘策略
Triplet Loss 的效果严重依赖于三元组的选择。
Easy/Hard/Semi-hard 三元组
Easy Negative: d_neg > d_pos + α (Loss = 0)
Hard Negative: d_neg < d_pos (可能导致训练不稳定)
Semi-hard Negative: d_pos < d_neg < d_pos + α (推荐,提供有效学习信号)
Online Triplet Mining
def online_triplet_mining(embeddings, labels, margin=0.2):
"""
Batch Hard 策略:
对每个 anchor,选择最难的正样本和最难的负样本
"""
pairwise_dist = compute_pairwise_distances(embeddings)
triplet_loss = 0
num_valid_triplets = 0
for i in range(len(embeddings)):
anchor_label = labels[i]
positive_mask = labels == anchor_label
positive_mask[i] = False
hardest_positive_dist = pairwise_dist[i][positive_mask].max()
negative_mask = labels != anchor_label
hardest_negative_dist = pairwise_dist[i][negative_mask].min()
loss = max(0, hardest_positive_dist - hardest_negative_dist + margin)
triplet_loss += loss
if loss > 0:
num_valid_triplets += 1
return triplet_loss / max(num_valid_triplets, 1)
2.4 FaceNet 的局限性
- 三元组组合爆炸 O(N³)
- 收敛慢
- 对采样策略敏感
- 没有显式的类别中心
三、ArcFace:基于角度间隔的革命性改进
3.1 从 Softmax 到 ArcFace 的演进
Softmax Loss → L-Softmax → SphereFace/A-Softmax → CosFace/AM-Softmax → ArcFace (目前最优)
3.2 Softmax Loss 回顾
传统分类的 Softmax Loss 只要求"正确类别分数最高",没有显式要求类间分离。
3.3 角度视角的重新理解
关键洞察:内积 = 模长 × 余弦。如果对 W 和 x 都做 L2 归一化,Softmax 变成了基于"角度"的分类。
3.4 ArcFace 损失函数
数学定义
ArcFace Loss: L = -log(exp(s · cos(θ_y + m)) / (exp(s · cos(θ_y + m)) + Σ_{j≠y} exp(s · cos(θ_j))))
其中:
- θ_y: 特征与真实类别权重的夹角
- m: 角度间隔 (margin),通常取 0.5 (弧度)
- s: 缩放因子,通常取 64
直观理解
为了被判为 Class A,x 需要满足 θ_A + m < θ_B。margin m 就是额外的要求,强制类内紧凑、类间分离。
3.5 为什么 ArcFace 更好?
与其他 Margin 方法对比,ArcFace 在角度空间上有恒定的间隔,几何意义最直观。
3.6 ArcFace 的训练细节
数值稳定性处理
def arcface_loss(logits, labels, s=64.0, m=0.5):
"""
数值稳定的 ArcFace 实现
logits = cos(θ),范围 [-1, 1]
"""
cos_theta = torch.clamp(logits, -1.0+1e-7, 1.0-1e-7)
theta = torch.acos(cos_theta)
target_logits = torch.cos(theta + m)
one_hot = F.one_hot(labels, num_classes)
output = logits * (1 - one_hot) + target_logits * one_hot
output *= s
return F.cross_entropy(output, labels)
四、完整 PyTorch 实现
4.1 Triplet Loss 实现
import torch
import torch.nn as nn
import torch.nn.functional as F
class TripletLoss(nn.Module):
""" Triplet Loss with online triplet mining
支持多种挖掘策略:
- batch_all: 使用所有有效三元组
- batch_hard: 每个 anchor 选最难的正负样本
- batch_semi_hard: 使用半困难三元组
"""
def __init__(self, margin=0.2, mining='batch_hard'):
super().__init__()
self.margin = margin
self.mining = mining
def forward(self, embeddings, labels):
dist_mat = self._pairwise_distances(embeddings)
if self.mining == 'batch_all':
return self._batch_all_triplet_loss(dist_mat, labels)
elif self.mining == 'batch_hard':
return self._batch_hard_triplet_loss(dist_mat, labels)
elif self.mining == 'batch_semi_hard':
return self._batch_semi_hard_triplet_loss(dist_mat, labels)
else:
raise ValueError(f"Unknown mining strategy: {self.mining}")
def _pairwise_distances(self, embeddings):
dot_product = torch.matmul(embeddings, embeddings.t())
square_norm = torch.diag(dot_product)
distances = square_norm.unsqueeze(0) - 2.0 * dot_product + square_norm.unsqueeze(1)
distances = F.relu(distances)
return distances
def _get_anchor_positive_mask(self, labels):
labels_equal = labels.unsqueeze(0) == labels.unsqueeze(1)
indices_not_equal = ~torch.eye(labels.size(0), dtype=torch.bool, device=labels.device)
return labels_equal & indices_not_equal
def _get_anchor_negative_mask(self, labels):
return labels.unsqueeze(0) != labels.unsqueeze(1)
def _batch_all_triplet_loss(self, dist_mat, labels):
anchor_positive_mask = self._get_anchor_positive_mask(labels)
anchor_negative_mask = self._get_anchor_negative_mask(labels)
anchor_positive_dist = dist_mat.unsqueeze(2)
anchor_negative_dist = dist_mat.unsqueeze(1)
triplet_loss = anchor_positive_dist - anchor_negative_dist + self.margin
mask = anchor_positive_mask.unsqueeze(2) & anchor_negative_mask.unsqueeze(1)
mask = mask.float()
triplet_loss = triplet_loss * mask
triplet_loss = F.relu(triplet_loss)
num_positive_triplets = (triplet_loss > 1e-16).float().sum()
loss = triplet_loss.sum() / (num_positive_triplets + 1e-16)
return loss
def _batch_hard_triplet_loss(self, dist_mat, labels):
anchor_positive_mask = self._get_anchor_positive_mask(labels)
anchor_negative_mask = self._get_anchor_negative_mask(labels)
anchor_positive_dist = dist_mat * anchor_positive_mask.float()
hardest_positive_dist, _ = anchor_positive_dist.max(dim=1, keepdim=True)
max_dist = dist_mat.max()
anchor_negative_dist = dist_mat + max_dist * (~anchor_negative_mask).float()
hardest_negative_dist, _ = anchor_negative_dist.min(dim=1, keepdim=True)
triplet_loss = F.relu(hardest_positive_dist - hardest_negative_dist + self.margin)
return triplet_loss.mean()
def _batch_semi_hard_triplet_loss(self, dist_mat, labels):
anchor_positive_mask = self._get_anchor_positive_mask(labels)
anchor_negative_mask = self._get_anchor_negative_mask(labels)
anchor_positive_dist = dist_mat.unsqueeze(2)
anchor_negative_dist = dist_mat.unsqueeze(1)
semi_hard_mask = (anchor_negative_dist > anchor_positive_dist) & \
(anchor_negative_dist < anchor_positive_dist + self.margin)
mask = anchor_positive_mask.unsqueeze(2) & \
anchor_negative_mask.unsqueeze(1) & \
semi_hard_mask
triplet_loss = anchor_positive_dist - anchor_negative_dist + self.margin
triplet_loss = triplet_loss * mask.float()
triplet_loss = F.relu(triplet_loss)
num_positive_triplets = (triplet_loss > 1e-16).float().sum()
loss = triplet_loss.sum() / (num_positive_triplets + 1e-16)
return loss
4.2 ArcFace Loss 实现
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
class ArcFaceLoss(nn.Module):
""" ArcFace Loss (Additive Angular Margin Loss) """
def __init__(self, in_features, out_features, s=64.0, m=0.50, easy_margin=False):
super().__init__()
self.in_features = in_features
self.out_features = out_features
self.s = s
self.m = m
self.easy_margin = easy_margin
self.weight = nn.Parameter(torch.FloatTensor(out_features, in_features))
nn.init.xavier_uniform_(self.weight)
self.cos_m = math.cos(m)
self.sin_m = math.sin(m)
self.th = math.cos(math.pi - m)
self.mm = math.sin(math.pi - m) * m
def forward(self, embeddings, labels):
weight_norm = F.normalize(self.weight, p=2, dim=1)
embeddings_norm = F.normalize(embeddings, p=2, dim=1)
cos_theta = F.linear(embeddings_norm, weight_norm)
cos_theta = cos_theta.clamp(-1.0+1e-7, 1.0-1e-7)
sin_theta = torch.sqrt(1.0 - cos_theta.pow(2))
cos_theta_m = cos_theta * self.cos_m - sin_theta * self.sin_m
if self.easy_margin:
cos_theta_m = torch.where(cos_theta > 0, cos_theta_m, cos_theta)
else:
cos_theta_m = torch.where(cos_theta > self.th, cos_theta_m, cos_theta - self.mm)
one_hot = torch.zeros_like(cos_theta)
one_hot.scatter_(1, labels.view(-1, 1), 1.0)
output = one_hot * cos_theta_m + (1.0 - one_hot) * cos_theta
output *= self.s
loss = F.cross_entropy(output, labels)
return loss
4.3 人脸识别网络
import torch
import torch.nn as nn
import torchvision.models as models
class FaceRecognitionNet(nn.Module):
""" 人脸识别网络 Backbone + Embedding Layer + Loss Head """
def __init__(self, backbone='resnet50', embedding_dim=512, num_classes=10000, loss_type='arcface', pretrained=True):
super().__init__()
self.backbone = self._build_backbone(backbone, pretrained)
with torch.no_grad():
dummy = torch.zeros(1, 3, 112, 112)
backbone_out_dim = self.backbone(dummy).shape[1]
self.embedding = nn.Sequential(
nn.Linear(backbone_out_dim, embedding_dim),
nn.BatchNorm1d(embedding_dim))
self.loss_type = loss_type
if loss_type == 'arcface':
self.loss_head = ArcFaceLoss(embedding_dim, num_classes, s=64.0, m=0.5)
elif loss_type == 'cosface':
self.loss_head = CombinedMarginLoss(embedding_dim, num_classes, s=64.0, m1=1.0, m2=0.0, m3=0.35)
elif loss_type == 'triplet':
self.loss_head = TripletLoss(margin=0.2, mining='batch_hard')
else:
raise ValueError(f"Unknown loss type: {loss_type}")
self.embedding_dim = embedding_dim
self.num_classes = num_classes
def _build_backbone(self, backbone_name, pretrained):
if backbone_name == 'resnet50':
backbone = models.resnet50(pretrained=pretrained)
backbone = nn.Sequential(*list(backbone.children())[:-1])
elif backbone_name == 'resnet34':
backbone = models.resnet34(pretrained=pretrained)
backbone = nn.Sequential(*list(backbone.children())[:-1])
elif backbone_name == 'mobilenet_v2':
backbone = models.mobilenet_v2(pretrained=pretrained)
backbone.classifier = nn.Identity()
elif backbone_name == 'iresnet50':
backbone = IResNet50()
else:
raise ValueError(f"Unknown backbone: {backbone_name}")
return backbone
def extract_embedding(self, x):
features = self.backbone(x)
features = features.flatten(1)
embedding = self.embedding(features)
embedding = F.normalize(embedding, p=2, dim=1)
return embedding
def forward(self, x, labels=None):
embedding = self.extract_embedding(x)
if labels is not None:
if self.loss_type == 'triplet':
loss = self.loss_head(embedding, labels)
else:
loss = self.loss_head(embedding, labels)
return loss, embedding
else:
return embedding
class IResNet50(nn.Module):
""" InsightFace 的 IResNet50 """
def __init__(self, num_features=512, dropout=0.0):
super().__init__()
resnet = models.resnet50(pretrained=False)
self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
self.bn1 = resnet.bn1
self.prelu = nn.PReLU(64)
self.layer1 = resnet.layer1
self.layer2 = resnet.layer2
self.layer3 = resnet.layer3
self.layer4 = resnet.layer4
self.bn2 = nn.BatchNorm2d(2048)
self.dropout = nn.Dropout(p=dropout)
self.fc = nn.Linear(2048*7*7, num_features)
self.bn3 = nn.BatchNorm1d(num_features)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.prelu(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.bn2(x)
x = self.dropout(x)
x = x.flatten(1)
x = self.fc(x)
x = self.bn3(x)
return x
4.4 训练流程
import os
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.cuda.amp import autocast, GradScaler
from tqdm import tqdm
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class FaceRecognitionTrainer:
""" 人脸识别训练器 """
def __init__(self, config):
self.config = config
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.model = FaceRecognitionNet(
backbone=config['backbone'],
embedding_dim=config['embedding_dim'],
num_classes=config['num_classes'],
loss_type=config['loss_type'],
pretrained=config['pretrained']).to(self.device)
self.optimizer = optim.SGD(
self.model.parameters(), lr=config['lr'], momentum=0.9, weight_decay=config['weight_decay'])
self.scheduler = optim.lr_scheduler.MultiStepLR(
self.optimizer, milestones=config['lr_milestones'], gamma=0.1)
self.scaler = GradScaler()
self.best_acc = 0.0
def train_epoch(self, train_loader, epoch):
self.model.train()
total_loss = 0.0
correct = 0
total = 0
pbar = tqdm(train_loader, desc=f'Epoch {epoch}')
for batch_idx, (images, labels) in enumerate(pbar):
images = images.to(self.device)
labels = labels.to(self.device)
with autocast():
loss, embeddings = self.model(images, labels)
self.optimizer.zero_grad()
self.scaler.scale(loss).backward()
self.scaler.step(self.optimizer)
self.scaler.update()
total_loss += loss.item()
if hasattr(self.model.loss_head, 'get_logits'):
with torch.no_grad():
logits = self.model.loss_head.get_logits(embeddings)
_, predicted = logits.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
pbar.set_postfix({'loss': f'{loss.item():.4f}', 'acc': f'{100.*correct/max(total,1):.2f}%'})
return total_loss / len(train_loader), 100.* correct / max(total, 1)
@torch.no_grad()
def validate(self, val_loader):
self.model.eval()
all_embeddings = []
all_labels = []
for images, labels in tqdm(val_loader, desc='Validating'):
images = images.to(self.device)
embeddings = self.model.extract_embedding(images)
all_embeddings.append(embeddings.cpu())
all_labels.append(labels)
all_embeddings = torch.cat(all_embeddings, dim=0)
all_labels = torch.cat(all_labels, dim=0)
acc = self.compute_verification_accuracy(all_embeddings, all_labels)
return acc
def compute_verification_accuracy(self, embeddings, labels):
embeddings = F.normalize(embeddings, p=2, dim=1)
similarity_matrix = torch.mm(embeddings, embeddings.t())
same_mask = labels.unsqueeze(0) == labels.unsqueeze(1)
same_mask.fill_diagonal_(False)
if same_mask.sum() > 0:
same_sim = similarity_matrix[same_mask].mean().item()
else:
same_sim = 0
diff_mask = ~same_mask
diff_mask.fill_diagonal_(False)
if diff_mask.sum() > 0:
diff_sim = similarity_matrix[diff_mask].mean().item()
else:
diff_sim = 0
threshold = (same_sim + diff_sim) / 2
correct_same = (similarity_matrix[same_mask] > threshold).float().mean().item()
correct_diff = (similarity_matrix[diff_mask] < threshold).float().mean().item()
accuracy = (correct_same + correct_diff) / 2 * 100
logger.info(f"Same similarity: {same_sim:.4f}, Diff similarity: {diff_sim:.4f}")
logger.info(f"Threshold: {threshold:.4f}, Accuracy: {accuracy:.2f}%")
return accuracy
def train(self, train_loader, val_loader, num_epochs):
for epoch in range(1, num_epochs + 1):
train_loss, train_acc = self.train_epoch(train_loader, epoch)
logger.info(f"Epoch {epoch}: Loss={train_loss:.4f}, Train Acc={train_acc:.2f}%")
self.scheduler.step()
logger.info(f"Learning rate: {self.scheduler.get_last_lr()[0]:.6f}")
if epoch % self.config['val_interval'] == 0:
val_acc = self.validate(val_loader)
if val_acc > self.best_acc:
self.best_acc = val_acc
self.save_checkpoint(f'best_model.pth')
logger.info(f"New best model! Accuracy: {val_acc:.2f}%")
if epoch % self.config['save_interval'] == 0:
self.save_checkpoint(f'checkpoint_epoch_{epoch}.pth')
def save_checkpoint(self, filename):
os.makedirs(self.config['save_dir'], exist_ok=True)
torch.save({
'model_state_dict': self.model.state_dict(),
'optimizer_state_dict': self.optimizer.state_dict(),
'scheduler_state_dict': self.scheduler.state_dict(),
'best_acc': self.best_acc
}, os.path.join(self.config['save_dir'], filename))
def main():
config = {
'backbone': 'resnet50',
'embedding_dim': 512,
'num_classes': 85742,
'loss_type': 'arcface',
'pretrained': True,
'lr': 0.1,
'weight_decay': 5e-4,
'lr_milestones': [10, 18, 22],
'batch_size': 64,
'num_epochs': 25,
'val_interval': 1,
'save_interval': 5,
'save_dir': './checkpoints',
'num_workers': 8
}
trainer = FaceRecognitionTrainer(config)
print("Training completed!")
if __name__ == '__main__':
main()
4.5 推理与特征比对
import cv2
import numpy as np
import torch
import torch.nn.functional as F
from PIL import Image
from torchvision import transforms
class FaceRecognizer:
""" 人脸识别推理器 """
def __init__(self, model_path, backbone='resnet50', embedding_dim=512, device='cuda'):
self.device = torch.device(device if torch.cuda.is_available() else 'cpu')
self.model = FaceRecognitionNet(
backbone=backbone, embedding_dim=embedding_dim, num_classes=1, loss_type='arcface', pretrained=False)
checkpoint = torch.load(model_path, map_location=self.device)
state_dict = {k: v for k, v in checkpoint['model_state_dict'].items() if not k.startswith('loss_head')}
self.model.load_state_dict(state_dict, strict=False)
self.model.to(self.device)
self.model.eval()
self.transform = transforms.Compose([
transforms.Resize((112, 112)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])
def preprocess(self, image):
if isinstance(image, np.ndarray):
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = Image.fromarray(image)
return self.transform(image).unsqueeze(0)
@torch.no_grad()
def extract_feature(self, image):
img_tensor = self.preprocess(image).to(self.device)
embedding = self.model.extract_embedding(img_tensor)
return embedding.cpu().numpy().flatten()
@torch.no_grad()
def extract_features_batch(self, images):
tensors = torch.stack([self.preprocess(img).squeeze(0) for img in images])
tensors = tensors.to(self.device)
embeddings = self.model.extract_embedding(tensors)
return embeddings.cpu().numpy()
@staticmethod
def cosine_similarity(feat1, feat2):
return np.dot(feat1, feat2)
@staticmethod
def euclidean_distance(feat1, feat2):
return np.linalg.norm(feat1 - feat2)
def verify(self, image1, image2, threshold=0.5):
feat1 = self.extract_feature(image1)
feat2 = self.extract_feature(image2)
similarity = self.cosine_similarity(feat1, feat2)
is_same = similarity >= threshold
return is_same, similarity
def identify(self, query_image, gallery_features, gallery_labels, threshold=0.5):
query_feat = self.extract_feature(query_image)
similarities = np.dot(gallery_features, query_feat)
max_idx = np.argmax(similarities)
max_similarity = similarities[max_idx]
if max_similarity >= threshold:
return gallery_labels[max_idx], max_similarity
else:
return None, max_similarity
def demo():
recognizer = FaceRecognizer(
model_path='checkpoints/best_model.pth',
backbone='resnet50',
embedding_dim=512)
img1 = cv2.imread('person1_a.jpg')
img2 = cv2.imread('person1_b.jpg')
is_same, similarity = recognizer.verify(img1, img2)
print(f"Same person: {is_same}, Similarity: {similarity:.4f}")
gallery_images = [cv2.imread(f'gallery/{i}.jpg') for i in range(10)]
gallery_labels = ['Alice', 'Bob', 'Charlie', ...]
gallery_features = recognizer.extract_features_batch(gallery_images)
query_img = cv2.imread('query.jpg')
identity, similarity = recognizer.identify(
query_img, gallery_features, gallery_labels, threshold=0.5)
print(f"Identity: {identity}, Similarity: {similarity:.4f}")
五、FaceNet vs ArcFace 深度对比
5.1 核心差异
| 维度 | FaceNet (Triplet) | ArcFace (Angular Margin) |
|---|
| 损失函数 | Triplet Loss | Softmax + Angular Margin |
| 优化目标 | 相对距离约束 | 绝对角度间隔 |
| 训练信号 | 每次一个三元组 | 所有类别参与 |
| 收敛速度 | 慢 | 快 |
| 实现复杂度 | 需要 triplet mining | 简单直接 |
| 性能 | LFW ~99.6% | LFW ~99.8% |
5.2 数学视角对比
Triplet Loss 约束相对关系:正样本比负样本更近。ArcFace 约束绝对位置:与正确类别的角度比其他类别小至少 m。
5.3 特征空间可视化
Triplet Loss 类内方差较大;ArcFace 类内非常紧凑,类间有明确的角度间隔。
5.4 实际选择建议
选择 FaceNet/Triplet Loss 的场景:
- 追求最高精度
- 类别数可控(<10 万)
- 快速收敛
- 工业部署
六、总结
6.1 核心要点
FaceNet (Triplet Loss):直接优化 embedding 空间的相对距离,可扩展性好,适合超大规模类别,但收敛慢。
ArcFace (Angular Margin):在角度空间加入加性 margin,训练简单,收敛快,精度最高,但需存储类别权重矩阵。
6.2 现代最佳实践
- 骨干网络:IResNet100 / EfficientNet
- 损失函数:ArcFace (s=64, m=0.5) 或 AdaFace
- 数据增强:随机裁剪、颜色抖动、MixUp
- 训练策略:大 batch (≥512)、Cosine 学习率衰减、混合精度训练
- 后处理:特征归一化、PCA 白化(可选)
6.3 一句话总结
FaceNet 告诉我们"应该学什么"(学习相似性),ArcFace 告诉我们"怎么学得更好"(角度间隔约束)。
- Schroff F, et al. 'FaceNet: A Unified Embedding for Face Recognition and Clustering.' CVPR 2015.
- Deng J, et al. 'ArcFace: Additive Angular Margin Loss for Deep Face Recognition.' CVPR 2019.
- Wang H, et al. 'CosFace: Large Margin Cosine Loss for Deep Face Recognition.' CVPR 2018.
- Liu W, et al. 'SphereFace: Deep Hypersphere Embedding for Face Recognition.' CVPR 2017.
相关免费在线工具
- 加密/解密文本
使用加密算法(如AES、TripleDES、Rabbit或RC4)加密和解密文本明文。 在线工具,加密/解密文本在线工具,online
- RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。 在线工具,RSA密钥对生成器在线工具,online
- Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表,支持源码编辑与即时渲染。 在线工具,Mermaid 预览与可视化编辑在线工具,online
- 随机西班牙地址生成器
随机生成西班牙地址(支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选),支持数量快捷选择、显示全部与下载。 在线工具,随机西班牙地址生成器在线工具,online
- Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印,支持批量处理与下载。 在线工具,Gemini 图片去水印在线工具,online
- curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。 在线工具,curl 转代码在线工具,online