Kubernetes 与边缘 AI 最佳实践
1. 边缘 AI 核心概念
1.1 什么是边缘 AI
边缘 AI 是指在边缘设备上运行 AI 模型,而不是在云端数据中心。边缘 AI 可以减少延迟、节省带宽、保护隐私,并在网络连接不稳定时保持服务可用性。
在边缘设备上运行 AI 模型的最佳实践,重点阐述如何利用 Kubernetes 管理边缘 AI 集群。内容包括边缘 AI 的核心概念与优势,边缘 Kubernetes 集群的搭建步骤(节点配置、初始化、网络插件),以及 AI 应用的部署流程(模型准备、Deployment、Service)。此外,文章还涵盖了边缘节点的管理策略(标签、污点、资源配额)、网络优化(CNI、网关)、存储配置、监控与可观测性(Prometheus、Grafana、Fluentd)以及安全实践(RBAC、模型加密)。最后提供了智能视频分析和传感器数据处理等实际应用场景及故障排查技巧,旨在帮助构建高性能、可靠的边缘计算系统。
边缘 AI 是指在边缘设备上运行 AI 模型,而不是在云端数据中心。边缘 AI 可以减少延迟、节省带宽、保护隐私,并在网络连接不稳定时保持服务可用性。
边缘节点要求
安装 Docker 和 kubeadm
# 安装 Docker
apt-get update && apt-get install -y docker.io
# 安装 kubeadm、kubelet 和 kubectl
apt-get update && apt-get install -y apt-transport-https curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | tee /etc/apt/sources.list.d/kubernetes.list
apt-get update && apt-get install -y kubelet kubeadm kubectl
初始化主节点
# 初始化主节点
kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=<主节点 IP>
# 配置 kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 安装网络插件
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
添加边缘节点
# 在边缘节点上执行
kubeadm join <主节点 IP>:6443 --token <token> --discovery-token-ca-cert-hash <hash>
# 下载并优化模型
mkdir -p models/yolo/1
wget -O models/yolo/1/model.onnx https://github.com/onnx/models/raw/main/vision/object_detection_segmentation/yolov4/model/yolov4.onnx
# 创建模型存储
kubectl create -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
EOF
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: edge-ai-service
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: edge-ai-service
template:
metadata:
labels:
app: edge-ai-service
spec:
nodeSelector:
node-role.kubernetes.io/edge: "true"
containers:
- name: edge-ai-service
image: edge-ai-service:latest
ports:
- containerPort: 8080
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 500m
memory: 512Mi
volumeMounts:
- name: model-volume
mountPath: /models
volumes:
- name: model-volume
persistentVolumeClaim:
claimName: model-pvc
service.yaml
apiVersion: v1
kind: Service
metadata:
name: edge-ai-service
namespace: default
spec:
selector:
app: edge-ai-service
ports:
- port: 8080
targetPort: 8080
type: NodePort
# 部署服务
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
# 测试服务
NODE_PORT=$(kubectl get svc edge-ai-service -o jsonpath='{.spec.ports[0].nodePort}')
EDGE_NODE_IP=$(kubectl get nodes -l node-role.kubernetes.io/edge=true -o jsonpath='{.items[0].status.addresses[0].address}')
curl -X POST http://$EDGE_NODE_IP:$NODE_PORT/predict -H "Content-Type: application/json" -d '{"image": "base64_encoded_image"}'
# 为边缘节点添加标签
kubectl label nodes <edge-node> node-role.kubernetes.io/edge=true
# 为边缘节点添加污点
kubectl taint nodes <edge-node> node-role.kubernetes.io/edge:NoSchedule
# 为应用添加容忍度
kubectl patch deployment edge-ai-service -p '{"spec":{"template":{"spec":{"tolerations":[{"key":"node-role.kubernetes.io/edge","operator":"Exists","effect":"NoSchedule"}]}}}}'
资源配额
apiVersion: v1
kind: ResourceQuota
metadata:
name: edge-node-quota
namespace: default
spec:
hard:
requests.cpu: "2"
requests.memory: "4Gi"
limits.cpu: "4"
limits.memory: "8Gi"
pods: "10"
配置 CNI 插件
# 安装 Calico CNI 插件
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# 配置网络策略
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: edge-ai-network-policy
namespace: default
spec:
podSelector:
matchLabels:
app: edge-ai-service
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: edge-gateway
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: edge-storage
ports:
- protocol: TCP
port: 9000
配置边缘网关
apiVersion: apps/v1
kind: Deployment
metadata:
name: edge-gateway
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: edge-gateway
template:
metadata:
labels:
app: edge-gateway
spec:
nodeSelector:
node-role.kubernetes.io/edge: "true"
containers:
- name: edge-gateway
image: nginx:latest
ports:
- containerPort: 80
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf
volumes:
- name: nginx-config
configMap:
name: edge-gateway-config
configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: edge-gateway-config
namespace: default
data:
nginx.conf: |
events {}
http {
server {
listen 80;
location / {
proxy_pass http://edge-ai-service:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
}
配置本地存储
apiVersion: v1
kind: PersistentVolume
metadata:
name: edge-local-storage
namespace: default
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
local:
path: /mnt/edge-storage
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/edge
operator: In
values:
- "true"
PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: edge-local-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: ""
selector:
matchLabels:
type: local
部署 Prometheus 和 Grafana
# 安装 Prometheus Operator
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring --create-namespace
# 配置边缘节点监控
kubectl apply -f - <<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: edge-ai-service-monitor
namespace: monitoring
spec:
selector:
matchLabels:
app: edge-ai-service
endpoints:
- port: 8080
path: /metrics
interval: 15s
EOF
配置 Fluentd
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
k8s-app: fluentd-logging
template:
metadata:
labels:
k8s-app: fluentd-logging
spec:
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1.14.6
env:
- name: FLUENTD_ARGS
value: --no-supervisor -q
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
RBAC 配置
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: edge-ai-role
namespace: default
rules:
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: edge-ai-rolebinding
namespace: default
subjects:
- kind: ServiceAccount
name: edge-ai-service-account
namespace: default
roleRef:
kind: Role
name: edge-ai-role
apiGroup: rbac.authorization.k8s.io
部署视频分析服务
apiVersion: apps/v1
kind: Deployment
metadata:
name: video-analytics
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: video-analytics
template:
metadata:
labels:
app: video-analytics
spec:
nodeSelector:
node-role.kubernetes.io/edge: "true"
containers:
- name: video-analytics
image: video-analytics:latest
ports:
- containerPort: 8080
env:
- name: MODEL_PATH
value: /models/yolo
- name: CAMERA_URL
value: rtsp://camera:554/stream
volumeMounts:
- name: model-volume
mountPath: /models
volumes:
- name: model-volume
persistentVolumeClaim:
claimName: model-pvc
部署传感器数据处理服务
apiVersion: apps/v1
kind: Deployment
metadata:
name: sensor-processing
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: sensor-processing
template:
metadata:
labels:
app: sensor-processing
spec:
nodeSelector:
node-role.kubernetes.io/edge: "true"
containers:
- name: sensor-processing
image: sensor-processing:latest
ports:
- containerPort: 8080
env:
- name: SENSOR_ENDPOINT
value: http://sensor:8000
- name: MODEL_PATH
value: /models/anomaly
volumeMounts:
- name: model-volume
mountPath: /models
volumes:
- name: model-volume
persistentVolumeClaim:
claimName: model-pvc
# 查看边缘节点状态
kubectl get nodes
# 查看边缘应用状态
kubectl get pods -l app=edge-ai-service
# 查看应用日志
kubectl logs -l app=edge-ai-service
# 检查边缘节点资源使用情况
kubectl top node <edge-node>
# 检查网络连接
kubectl exec -it <pod-name> -- ping <target-host>
Kubernetes 为边缘 AI 提供了强大的部署和管理能力。通过合理配置边缘节点、优化网络和存储、实施安全最佳实践,可以构建高性能、可靠的边缘 AI 系统。
关键要点:
通过以上最佳实践,可以充分发挥边缘 AI 的优势,构建更加高效、可靠的边缘计算系统。

微信公众号「极客日志」,在微信中扫描左侧二维码关注。展示文案:极客日志 zeeklog
使用加密算法(如AES、TripleDES、Rabbit或RC4)加密和解密文本明文。 在线工具,加密/解密文本在线工具,online
生成新的随机RSA私钥和公钥pem证书。 在线工具,RSA密钥对生成器在线工具,online
基于 Mermaid.js 实时预览流程图、时序图等图表,支持源码编辑与即时渲染。 在线工具,Mermaid 预览与可视化编辑在线工具,online
将字符串编码和解码为其 Base64 格式表示形式即可。 在线工具,Base64 字符串编码/解码在线工具,online
将字符串、文件或图像转换为其 Base64 表示形式。 在线工具,Base64 文件转换器在线工具,online
将 Markdown(GFM)转为 HTML 片段,浏览器内 marked 解析;与 HTML 转 Markdown 互为补充。 在线工具,Markdown 转 HTML在线工具,online