OpenVINO 本地部署 DeepSeek-R1 量化大模型：前端交互与后端服务

介绍基于 OpenVINO 本地部署 DeepSeek-R1 量化大模型的前端交互与后端服务实现。内容涵盖前端 HTML 页面开发、Flask 后端接口搭建及 OpenVINO GenAI 推理引擎集成。通过示例代码展示了健康检查、对话生成及模型信息获取等核心功能，并分析了 CPU 和内存的资源占用情况，为本地化运行大模型提供参考方案。

黑客帝国发布于 2026/4/6更新于 2026/7/2453 浏览

一、前言

基于上一章的环境准备和模型转换，本章专注于后端服务器的部署以及前端页面的启动。

整个后端服务器是依赖于 OpenVINO 的，不过只要你的设备可以使用 OpenVINO，理论上就可以使用这个后端，如果你有 intel 的独立显卡，只需要把代码中 device 更改为对应的设备即可运行在 GPU 上。

self.pipeline = ov_genai.LLMPipeline(self.model_path, device)

二、前端交互界面

chat_interface.html

<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>DeepSeek-R1 智能对话</title>
<link rel="icon" type="image/svg+xml" href="logo.svg">
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; background: linear-gradient(135deg, #667eea ,  ); : ; : ; }
 { : ; :  auto; : white; : ; :    (,,,); : hidden; : flex; : column; : ; }
 { : ; : white; : ; : center; }
 { : ; :  ; :  solid ; : flex; : space-between; : center; : ; }
 { :  ; : ; : bold; }
 { : ; : ; }
 { : ; : ; }
 { : ; : auto; : ; : ; }
 { : ; : flex; : flex-start; }
 { : flex-end; }
 { : flex-start; }
 { : ; :  ; : ; : ; }
  { : ; : white; : ; }
  { : white; : ; :  solid ; : ; }
 { : ; : ; : ; }
 { : none; : ; : italic; :  ; :  ; }
 { : ; : white; :  solid ; }
 { : flex; : ; }
 { : ; :  ; :  solid ; : ; : none; : ; }
 { : ; }
 { :  ; : ; : white; : none; : ; : pointer; : ; : background ; }
() { : ; }
 { : ; : not-allowed; }
 { : center; : ; : ; :  ; : ; : ; : ; }




🤖 DeepSeek-R1 智能对话
基于 OpenVINO 本地部署


服务状态：检测中...
模型加载中

系统正在初始化，请稍候...
DeepSeek 正在思考中...





发送

OpenVINO 本地部署 DeepSeek-R1 量化大模型：前端交互与后端服务

黑客帝国发布于 2026/4/6更新于 2026/7/2453 浏览

<!DOCTYPE html> <html lang="zh-CN"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>DeepSeek-R1 智能对话</title> <link rel="icon" type="image/svg+xml" href="logo.svg"> <style> * { margin: 0; padding: 0; box-sizing: border-box; } body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; background: linear-gradient(135deg, #667eea , ); : ; : ; } { : ; : auto; : white; : ; : (,,,); : hidden; : flex; : column; : ; } { : ; : white; : ; : center; } { : ; : ; : solid ; : flex; : space-between; : center; : ; } { : ; : ; : bold; } { : ; : ; } { : ; : ; } { : ; : auto; : ; : ; } { : ; : flex; : flex-start; } { : flex-end; } { : flex-start; } { : ; : ; : ; : ; } { : ; : white; : ; } { : white; : ; : solid ; : ; } { : ; : ; : ; } { : none; : ; : italic; : ; : ; } { : ; : white; : solid ; } { : flex; : ; } { : ; : ; : solid ; : ; : none; : ; } { : ; } { : ; : ; : white; : none; : ; : pointer; : ; : background ; } () { : ; } { : ; : not-allowed; } { : center; : ; : ; : ; : ; : ; : ; } 🤖 DeepSeek-R1 智能对话基于 OpenVINO 本地部署服务状态：检测中... 模型加载中系统正在初始化，请稍候... DeepSeek 正在思考中... 发送

import openvino_genai as ov_genai from flask import Flask, request, jsonify from flask_cors import CORS import threading import logging from typing import Dict, Any from pathlib import Path class OpenVINOBackend: def __init__(self, model_path: str): self.model_path = model_path self.pipeline = None self.is_ready = False self.init_lock = threading.Lock() def initialize_model(self, device: str): with self.init_lock: if self.is_ready: return logger.info("正在加载 OpenVINO 模型...") try: self.pipeline = ov_genai.LLMPipeline(self.model_path, device) self.is_ready = True logger.info("OpenVINO 模型加载完成！") except Exception as e: logger.error(f"模型加载失败：{str(e)}") self.is_ready = False def generate_response(self, message: str, max_tokens: int = 512, temperature: float = 0.7, do_sample: bool = True) -> Dict[str, Any]: if not self.is_ready or self.pipeline is None: return {"status": "error", "error": "模型未就绪"} try: config = ov_genai.GenerationConfig() config.max_new_tokens = max_tokens config.temperature = temperature config.do_sample = do_sample self.pipeline.start_chat() response = self.pipeline.generate(message, config) self.pipeline.finish_chat() return { "status": "success", "response": response, "tokens_generated": len(response.split()) } except Exception as e: return {"status": "error", "error": str(e)} def create_app(backend: OpenVINOBackend) -> Flask: app = Flask(__name__) CORS(app) @app.route('/health', methods=['GET']) def health_check(): return jsonify({ "status": "ready" if backend.is_ready else "loading", "backend": "openvino_genai", "device": "CPU" }) @app.route('/chat', methods=['POST']) def chat_endpoint(): data = request.json if not data or 'message' not in data: return jsonify({"status": "error", "error": "缺少 message 参数"}) message = data['message'] max_tokens = min(max(int(data.get('max_tokens', 512)), 1), 2048) result = backend.generate_response(message, max_tokens) return jsonify(result) @app.route('/model_info', methods=['GET']) def model_info(): return jsonify({ "backend": "openvino_genai", "status": "ready" if backend.is_ready else "loading", "device": "CPU", "optimization": "int4_quantization" }) return app def initialize_backend(backend: OpenVINOBackend, device: str='CPU'): def init_task(): backend.initialize_model(device) thread = threading.Thread(target=init_task, daemon=True) thread.start() def main(model_path: str = "./converted_ov_model"): logging.basicConfig( level=logging.INFO, format='-- %(levelname)s - %(message)s', force=True ) global logger logger = logging.getLogger(__name__) logger.info("Starting OpenVINO GenAI server...") p = Path(model_path) if not p.exists(): logger.error(f"路径不存在：{model_path}") return if not p.is_dir(): logger.error(f"请传入模型文件夹：{model_path}") return backend = OpenVINOBackend(model_path) app = create_app(backend) initialize_backend(backend, "CPU") logger.info(f"启动 OpenVINO GenAI 服务器，模型路径：{model_path}...") app.run(host='0.0.0.0', port=5000, debug=False) if __name__ == '__main__': path = "./DeepSeek-R1-0528-Qwen3-8B_openvino_int4" main(path)

OpenVINO 本地部署 DeepSeek-R1 量化大模型：前端交互与后端服务

一、前言

二、前端交互界面

chat_interface.html

OpenVINO 本地部署 DeepSeek-R1 量化大模型：前端交互与后端服务

一、前言

二、前端交互界面

chat_interface.html

三、后端模型服务器

openvino_server.py

四、资源占用

更多推荐文章

相关免费在线工具

OpenVINO 本地部署 DeepSeek-R1 量化大模型：前端交互与后端服务

一、前言

二、前端交互界面

chat_interface.html

OpenVINO 本地部署 DeepSeek-R1 量化大模型：前端交互与后端服务

一、前言

二、前端交互界面

chat_interface.html

三、后端模型服务器

openvino_server.py

四、资源占用

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具