Qwen3.5-35B-A3B-AWQ-4bit多模态API封装:FastAPI接口+Swagger文档+鉴权机制

Qwen3.5-35B-A3B-AWQ-4bit多模态API封装:FastAPI接口+Swagger文档+鉴权机制

1. 引言:从Web界面到企业级API

如果你已经体验过Qwen3.5-35B-A3B-AWQ-4bit的Web界面,知道它能看懂图片、回答图文问题,那么你可能会想:这个能力能不能集成到我的业务系统里?比如,电商平台自动分析商品主图,教育应用智能批改作业图片,或者内容平台自动给图片生成描述。

直接调用Web界面显然不行——没有标准的API接口,没有文档,没有安全控制。这就是我们今天要解决的问题:把强大的多模态模型封装成企业级的API服务

本文将带你一步步实现:

  • 用FastAPI搭建RESTful API接口
  • 集成Swagger UI自动生成交互式文档
  • 添加API密钥鉴权保护你的服务
  • 处理图片上传和流式响应

无论你是想为内部团队提供AI能力,还是构建面向客户的AI产品,这套方案都能让你快速落地。

2. 环境准备与项目结构

2.1 基础环境要求

在开始之前,确保你的环境满足以下要求:

  • Python 3.8+:推荐Python 3.9或3.10
  • CUDA 11.8+:GPU推理必备
  • 至少24GB GPU内存:双卡环境,每卡12GB以上
  • 已部署Qwen3.5-35B-A3B-AWQ-4bit服务:后端在8000端口运行

2.2 创建项目目录

我们先创建一个清晰的项目结构:

qwen-api-service/ ├── app/ │ ├── __init__.py │ ├── main.py # FastAPI应用入口 │ ├── api/ │ │ ├── __init__.py │ │ ├── endpoints.py # API路由定义 │ │ └── dependencies.py # 依赖项(如鉴权) │ ├── core/ │ │ ├── __init__.py │ │ ├── config.py # 配置管理 │ │ └── security.py # 安全相关 │ ├── models/ │ │ ├── __init__.py │ │ └── schemas.py # Pydantic数据模型 │ └── services/ │ ├── __init__.py │ └── qwen_client.py # Qwen服务客户端 ├── requirements.txt # 依赖包列表 ├── .env.example # 环境变量示例 └── README.md # 项目说明 

2.3 安装依赖包

创建requirements.txt文件:

fastapi==0.104.1 uvicorn[standard]==0.24.0 python-multipart==0.0.6 python-jose[cryptography]==3.3.0 passlib[bcrypt]==1.7.4 pydantic==2.5.0 pydantic-settings==2.1.0 httpx==0.25.1 pillow==10.1.0 python-dotenv==1.0.0 

安装依赖:

pip install -r requirements.txt 

3. FastAPI基础框架搭建

3.1 创建FastAPI应用

app/main.py中创建主应用:

from fastapi import FastAPI from fastapi.middleware.cors import CORSMiddleware from app.api.endpoints import router as api_router from app.core.config import settings # 创建FastAPI应用实例 app = FastAPI( title="Qwen3.5多模态API服务", description="基于Qwen3.5-35B-A3B-AWQ-4bit的多模态API接口,支持图片理解、图文问答等功能", version="1.0.0", docs_url="/docs", # Swagger UI地址 redoc_url="/redoc", # ReDoc地址 ) # 配置CORS(跨域资源共享) app.add_middleware( CORSMiddleware, allow_origins=settings.ALLOWED_ORIGINS, # 允许的域名 allow_credentials=True, allow_methods=["*"], # 允许所有HTTP方法 allow_headers=["*"], # 允许所有请求头 ) # 包含API路由 app.include_router(api_router, prefix="/api/v1") # 健康检查端点 @app.get("/health") async def health_check(): """服务健康检查""" return { "status": "healthy", "service": "qwen-multimodal-api", "version": "1.0.0" } # 根路径重定向到文档 @app.get("/") async def root(): """根路径,重定向到API文档""" from fastapi.responses import RedirectResponse return RedirectResponse(url="/docs") 

3.2 配置管理

app/core/config.py中管理配置:

from pydantic_settings import BaseSettings from typing import List class Settings(BaseSettings): """应用配置""" # API配置 API_V1_STR: str = "/api/v1" PROJECT_NAME: str = "Qwen3.5多模态API服务" # CORS配置 ALLOWED_ORIGINS: List[str] = [ "http://localhost:3000", "http://127.0.0.1:3000", # 添加你的前端域名 ] # Qwen服务配置 QWEN_BASE_URL: str = "http://127.0.0.1:8000" QWEN_TIMEOUT: int = 300 # 5分钟超时 # 安全配置 API_KEY_HEADER: str = "X-API-Key" API_KEYS: List[str] = [] # 有效的API密钥列表 # 文件上传配置 MAX_UPLOAD_SIZE: int = 10 * 1024 * 1024 # 10MB ALLOWED_IMAGE_TYPES: List[str] = ["image/jpeg", "image/png", "image/webp"] class Config: env_file = ".env" case_sensitive = True settings = Settings() 

创建.env文件配置环境变量:

# Qwen服务地址 QWEN_BASE_URL=http://127.0.0.1:8000 # API密钥(用逗号分隔多个密钥) API_KEYS=your_api_key_1,your_api_key_2,test_key_123 # 允许的前端域名 ALLOWED_ORIGINS=http://localhost:3000,http://127.0.0.1:3000 

4. 实现API密钥鉴权

4.1 创建鉴权依赖

app/api/dependencies.py中实现API密钥验证:

from fastapi import Header, HTTPException, status from typing import Optional from app.core.config import settings async def verify_api_key( x_api_key: Optional[str] = Header(None, alias=settings.API_KEY_HEADER) ) -> str: """ 验证API密钥 Args: x_api_key: 请求头中的API密钥 Returns: 验证通过的API密钥 Raises: HTTPException: 如果密钥无效或缺失 """ if not settings.API_KEYS: # 如果没有配置API密钥,允许所有请求(仅用于测试) return "no_auth_required" if not x_api_key: raise HTTPException( status_code=status.HTTP_401_UNAUTHORIZED, detail="API密钥缺失", headers={"WWW-Authenticate": "API-Key"}, ) if x_api_key not in settings.API_KEYS: raise HTTPException( status_code=status.HTTP_403_FORBIDDEN, detail="无效的API密钥", ) return x_api_key 

4.2 安全工具函数

app/core/security.py中添加安全相关功能:

import secrets from typing import List def generate_api_key(length: int = 32) -> str: """ 生成安全的API密钥 Args: length: 密钥长度 Returns: 生成的API密钥 """ return secrets.token_urlsafe(length) def validate_api_key(api_key: str, valid_keys: List[str]) -> bool: """ 验证API密钥是否有效 Args: api_key: 待验证的密钥 valid_keys: 有效的密钥列表 Returns: 是否有效 """ return api_key in valid_keys def mask_api_key(api_key: str) -> str: """ 掩码显示API密钥(用于日志) Args: api_key: 原始API密钥 Returns: 掩码后的密钥 """ if len(api_key) <= 8: return "***" return f"{api_key[:4]}...{api_key[-4:]}" 

5. 定义数据模型

5.1 请求和响应模型

app/models/schemas.py中定义数据模型:

from pydantic import BaseModel, Field from typing import Optional, List, Any from enum import Enum class ImageInputType(str, Enum): """图片输入类型""" URL = "url" BASE64 = "base64" UPLOAD = "upload" class ChatMessage(BaseModel): """聊天消息""" role: str = Field(..., description="消息角色:user或assistant") content: str = Field(..., description="消息内容") class Config: json_schema_extra = { "example": { "role": "user", "content": "这张图片里有什么?" } } class ImageInfo(BaseModel): """图片信息""" type: ImageInputType = Field(..., description="图片类型:url/base64/upload") data: str = Field(..., description="图片数据:URL、base64字符串或上传的文件ID") class Config: json_schema_extra = { "example": { "type": "url", "data": "https://example.com/image.jpg" } } class ChatRequest(BaseModel): """聊天请求""" messages: List[ChatMessage] = Field(..., description="消息历史") image: Optional[ImageInfo] = Field(None, description="图片信息") max_tokens: int = Field(1024, description="最大生成token数") temperature: float = Field(0.7, description="温度参数,控制随机性") stream: bool = Field(False, description="是否流式响应") class Config: json_schema_extra = { "example": { "messages": [ {"role": "user", "content": "描述这张图片"} ], "image": { "type": "url", "data": "https://example.com/image.jpg" }, "max_tokens": 1024, "temperature": 0.7, "stream": False } } class ChatResponse(BaseModel): """聊天响应""" id: str = Field(..., description="响应ID") object: str = Field("chat.completion", description="对象类型") created: int = Field(..., description="创建时间戳") model: str = Field(..., description="模型名称") choices: List[Any] = Field(..., description="选择列表") usage: Optional[Any] = Field(None, description="使用统计") class Config: json_schema_extra = { "example": { "id": "chatcmpl-123", "object": "chat.completion", "created": 1677652288, "model": "qwen3.5-35b-awq", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "这是一张..." }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 56, "completion_tokens": 31, "total_tokens": 87 } } } class ErrorResponse(BaseModel): """错误响应""" error: str = Field(..., description="错误信息") code: int = Field(..., description="错误代码") detail: Optional[str] = Field(None, description="错误详情") class Config: json_schema_extra = { "example": { "error": "Invalid API key", "code": 403, "detail": "提供的API密钥无效" } } 

6. 实现Qwen服务客户端

6.1 创建HTTP客户端

app/services/qwen_client.py中实现与Qwen后端的通信:

import httpx import base64 import mimetypes from typing import Optional, Dict, Any, AsyncGenerator from pathlib import Path import asyncio from app.core.config import settings import logging logger = logging.getLogger(__name__) class QwenClient: """Qwen服务客户端""" def __init__(self): self.base_url = settings.QWEN_BASE_URL self.timeout = settings.QWEN_TIMEOUT self.client = httpx.AsyncClient( timeout=httpx.Timeout(self.timeout), limits=httpx.Limits(max_connections=100, max_keepalive_connections=20) ) async def close(self): """关闭客户端连接""" await self.client.aclose() async def _prepare_image_data(self, image_info: Dict[str, Any]) -> Optional[str]: """ 准备图片数据 Args: image_info: 图片信息 Returns: base64编码的图片数据 """ image_type = image_info.get("type") image_data = image_info.get("data") if not image_type or not image_data: return None if image_type == "url": # 下载图片并转换为base64 try: async with httpx.AsyncClient() as client: response = await client.get(image_data) response.raise_for_status() # 获取图片类型 content_type = response.headers.get("content-type", "image/jpeg") # 编码为base64 base64_data = base64.b64encode(response.content).decode("utf-8") return f"data:{content_type};base64,{base64_data}" except Exception as e: logger.error(f"下载图片失败: {e}") return None elif image_type == "base64": # 如果是base64数据,直接返回 if image_data.startswith("data:"): return image_data else: # 假设是纯base64,添加默认类型 return f"data:image/jpeg;base64,{image_data}" elif image_type == "upload": # 处理上传的文件(需要先保存到临时位置) # 这里简化处理,实际需要根据你的文件存储方案实现 return None return None async def chat_completion( self, messages: list, image: Optional[Dict[str, Any]] = None, max_tokens: int = 1024, temperature: float = 0.7, stream: bool = False ) -> Dict[str, Any]: """ 调用Qwen聊天补全接口 Args: messages: 消息列表 image: 图片信息 max_tokens: 最大token数 temperature: 温度参数 stream: 是否流式响应 Returns: 响应数据 """ # 准备请求数据 request_data = { "messages": messages, "max_tokens": max_tokens, "temperature": temperature, "stream": stream } # 如果有图片,添加到请求中 if image: image_data = await self._prepare_image_data(image) if image_data: # 将图片作为特殊消息添加到消息列表 image_message = { "role": "user", "content": [ {"type": "text", "text": messages[-1]["content"] if messages else ""}, {"type": "image_url", "image_url": {"url": image_data}} ] } # 替换最后一个用户消息(如果有的话) if messages and messages[-1]["role"] == "user": messages[-1] = image_message else: messages.append(image_message) try: # 调用Qwen服务 response = await self.client.post( f"{self.base_url}/v1/chat/completions", json=request_data, headers={"Content-Type": "application/json"} ) response.raise_for_status() return response.json() except httpx.HTTPStatusError as e: logger.error(f"HTTP错误: {e.response.status_code} - {e.response.text}") raise except Exception as e: logger.error(f"调用Qwen服务失败: {e}") raise async def chat_completion_stream( self, messages: list, image: Optional[Dict[str, Any]] = None, max_tokens: int = 1024, temperature: float = 0.7 ) -> AsyncGenerator[str, None]: """ 流式调用Qwen聊天补全接口 Args: messages: 消息列表 image: 图片信息 max_tokens: 最大token数 temperature: 温度参数 Yields: SSE格式的数据块 """ # 准备请求数据(与普通请求相同) request_data = { "messages": messages, "max_tokens": max_tokens, "temperature": temperature, "stream": True } # 处理图片(与普通请求相同) if image: image_data = await self._prepare_image_data(image) if image_data: image_message = { "role": "user", "content": [ {"type": "text", "text": messages[-1]["content"] if messages else ""}, {"type": "image_url", "image_url": {"url": image_data}} ] } if messages and messages[-1]["role"] == "user": messages[-1] = image_message else: messages.append(image_message) try: # 发起流式请求 async with self.client.stream( "POST", f"{self.base_url}/v1/chat/completions", json=request_data, headers={"Content-Type": "application/json"} ) as response: response.raise_for_status() # 流式读取响应 async for chunk in response.aiter_bytes(): if chunk: yield chunk.decode("utf-8") except Exception as e: logger.error(f"流式调用失败: {e}") yield f"data: {json.dumps({'error': str(e)})}\n\n" # 创建全局客户端实例 qwen_client = QwenClient() 

7. 实现API端点

7.1 定义API路由

app/api/endpoints.py中实现所有API端点:

from fastapi import APIRouter, Depends, HTTPException, status, UploadFile, File, Form from fastapi.responses import StreamingResponse from typing import Optional, List import json import time import uuid from app.api.dependencies import verify_api_key from app.models.schemas import ( ChatRequest, ChatResponse, ErrorResponse, ImageInputType, ChatMessage, ImageInfo ) from app.services.qwen_client import qwen_client from app.core.config import settings router = APIRouter() @router.post( "/chat/completions", response_model=ChatResponse, responses={ 400: {"model": ErrorResponse}, 401: {"model": ErrorResponse}, 500: {"model": ErrorResponse}, }, summary="聊天补全", description="调用Qwen3.5模型进行图文对话" ) async def chat_completion( request: ChatRequest, api_key: str = Depends(verify_api_key) ): """ 聊天补全接口 - **messages**: 消息历史 - **image**: 图片信息(可选) - **max_tokens**: 最大生成token数 - **temperature**: 温度参数 - **stream**: 是否流式响应 返回模型的回答 """ try: # 调用Qwen服务 response_data = await qwen_client.chat_completion( messages=[msg.dict() for msg in request.messages], image=request.image.dict() if request.image else None, max_tokens=request.max_tokens, temperature=request.temperature, stream=request.stream ) # 添加响应ID和时间戳 response_data["id"] = f"chatcmpl-{uuid.uuid4().hex[:16]}" response_data["created"] = int(time.time()) response_data["model"] = "qwen3.5-35b-awq-4bit" return response_data except Exception as e: raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f"服务调用失败: {str(e)}" ) @router.post( "/chat/completions/stream", summary="流式聊天补全", description="流式调用Qwen3.5模型进行图文对话" ) async def chat_completion_stream( request: ChatRequest, api_key: str = Depends(verify_api_key) ): """ 流式聊天补全接口 返回Server-Sent Events (SSE)格式的流式响应 """ async def event_generator(): try: # 生成流式响应 async for chunk in qwen_client.chat_completion_stream( messages=[msg.dict() for msg in request.messages], image=request.image.dict() if request.image else None, max_tokens=request.max_tokens, temperature=request.temperature ): yield chunk except Exception as e: error_data = json.dumps({ "error": str(e), "code": 500 }) yield f"data: {error_data}\n\n" return StreamingResponse( event_generator(), media_type="text/event-stream", headers={ "Cache-Control": "no-cache", "Connection": "keep-alive", "X-Accel-Buffering": "no" # 禁用Nginx缓冲 } ) @router.post( "/images/upload", summary="上传图片", description="上传图片并返回文件ID,用于后续的图文对话" ) async def upload_image( file: UploadFile = File(..., description="图片文件"), api_key: str = Depends(verify_api_key) ): """ 上传图片接口 - **file**: 图片文件(支持jpg、png、webp格式) 返回文件ID和URL """ # 检查文件类型 if file.content_type not in settings.ALLOWED_IMAGE_TYPES: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=f"不支持的文件类型。支持的类型: {', '.join(settings.ALLOWED_IMAGE_TYPES)}" ) # 检查文件大小 content = await file.read() if len(content) > settings.MAX_UPLOAD_SIZE: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=f"文件大小超过限制(最大{settings.MAX_UPLOAD_SIZE // 1024 // 1024}MB)" ) # 在实际应用中,这里应该将文件保存到存储服务(如S3、MinIO等) # 并返回文件的访问URL # 这里简化处理,直接返回base64数据 import base64 base64_data = base64.b64encode(content).decode("utf-8") file_id = str(uuid.uuid4()) # 在实际应用中,应该保存文件并返回URL # 这里返回base64数据供测试使用 return { "file_id": file_id, "url": f"data:{file.content_type};base64,{base64_data}", "filename": file.filename, "content_type": file.content_type, "size": len(content) } @router.get( "/models", summary="获取模型列表", description="获取可用的模型列表" ) async def list_models( api_key: str = Depends(verify_api_key) ): """ 获取模型列表接口 返回当前可用的模型信息 """ return { "object": "list", "data": [ { "id": "qwen3.5-35b-awq-4bit", "object": "model", "created": 1677610602, "owned_by": "qwen", "permission": [], "root": "qwen3.5-35b-awq-4bit", "parent": None } ] } @router.get( "/models/{model_id}", summary="获取模型信息", description="获取指定模型的详细信息" ) async def retrieve_model( model_id: str, api_key: str = Depends(verify_api_key) ): """ 获取模型信息接口 - **model_id**: 模型ID 返回模型详细信息 """ if model_id != "qwen3.5-35b-awq-4bit": raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=f"模型 '{model_id}' 不存在" ) return { "id": "qwen3.5-35b-awq-4bit", "object": "model", "created": 1677610602, "owned_by": "qwen", "permission": [], "root": "qwen3.5-35b-awq-4bit", "parent": None, "description": "Qwen3.5-35B-A3B-AWQ-4bit多模态模型,支持图片理解和图文对话" } 

8. 启动和测试API服务

8.1 创建启动脚本

创建run.py作为启动脚本:

import uvicorn from app.main import app from app.core.config import settings import logging # 配置日志 logging.basicConfig( level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", handlers=[ logging.StreamHandler(), logging.FileHandler("api.log") ] ) if __name__ == "__main__": uvicorn.run( "app.main:app", host="0.0.0.0", port=8080, # API服务端口 reload=True, # 开发模式热重载 log_level="info" ) 

8.2 使用Swagger UI测试

启动服务后,访问 http://localhost:8080/docs 可以看到自动生成的Swagger文档:

  1. 查看所有API端点:页面左侧显示所有可用的API
  2. 测试API接口:点击任意端点,点击"Try it out"按钮
  3. 填写请求参数:在请求体中填写JSON数据
  4. 执行请求:点击"Execute"按钮发送请求
  5. 查看响应:在下方查看服务器返回的结果

8.3 使用curl测试API

测试聊天接口:

# 测试普通聊天(无图片) curl -X POST "http://localhost:8080/api/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "X-API-Key: your_api_key_1" \ -d '{ "messages": [ {"role": "user", "content": "你好,请介绍一下你自己"} ], "max_tokens": 100, "temperature": 0.7 }' # 测试图片上传 curl -X POST "http://localhost:8080/api/v1/images/upload" \ -H "X-API-Key: your_api_key_1" \ -F "file=@/path/to/your/image.jpg" # 测试图文对话(使用图片URL) curl -X POST "http://localhost:8080/api/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "X-API-Key: your_api_key_1" \ -d '{ "messages": [ {"role": "user", "content": "描述这张图片里的内容"} ], "image": { "type": "url", "data": "https://example.com/image.jpg" }, "max_tokens": 200, "temperature": 0.7 }' 

8.4 使用Python客户端测试

创建测试脚本test_client.py

import asyncio import aiohttp import base64 from pathlib import Path async def test_chat_completion(): """测试聊天补全接口""" url = "http://localhost:8080/api/v1/chat/completions" headers = { "Content-Type": "application/json", "X-API-Key": "your_api_key_1" } # 测试文本对话 data = { "messages": [ {"role": "user", "content": "你好,请用中文回答"} ], "max_tokens": 100, "temperature": 0.7 } async with aiohttp.ClientSession() as session: async with session.post(url, json=data, headers=headers) as response: result = await response.json() print("文本对话结果:", result) async def test_image_upload(): """测试图片上传接口""" url = "http://localhost:8080/api/v1/images/upload" headers = { "X-API-Key": "your_api_key_1" } # 读取本地图片 image_path = Path("test_image.jpg") if not image_path.exists(): print("测试图片不存在") return data = aiohttp.FormData() data.add_field('file', open(image_path, 'rb'), filename=image_path.name, content_type='image/jpeg') async with aiohttp.ClientSession() as session: async with session.post(url, data=data, headers=headers) as response: result = await response.json() print("图片上传结果:", result) # 使用上传的图片进行对话 if 'url' in result: await test_image_chat(result['url']) async def test_image_chat(image_url: str): """测试图文对话""" url = "http://localhost:8080/api/v1/chat/completions" headers = { "Content-Type": "application/json", "X-API-Key": "your_api_key_1" } data = { "messages": [ {"role": "user", "content": "描述这张图片里的内容"} ], "image": { "type": "url", "data": image_url }, "max_tokens": 200, "temperature": 0.7 } async with aiohttp.ClientSession() as session: async with session.post(url, json=data, headers=headers) as response: result = await response.json() print("图文对话结果:", result) async def main(): """运行所有测试""" print("开始测试API接口...") # 测试文本对话 print("\n1. 测试文本对话...") await test_chat_completion() # 测试图片上传和图文对话 print("\n2. 测试图片上传和图文对话...") await test_image_upload() print("\n测试完成!") if __name__ == "__main__": asyncio.run(main()) 

9. 生产环境部署建议

9.1 使用Gunicorn部署

对于生产环境,建议使用Gunicorn + Uvicorn:

# 安装Gunicorn pip install gunicorn # 使用Gunicorn启动(4个工作进程) gunicorn app.main:app \ --workers 4 \ --worker-class uvicorn.workers.UvicornWorker \ --bind 0.0.0.0:8080 \ --timeout 300 \ --access-logfile access.log \ --error-logfile error.log \ --log-level info 

9.2 使用Docker容器化

创建Dockerfile

FROM python:3.9-slim WORKDIR /app # 安装系统依赖 RUN apt-get update && apt-get install -y \ gcc \ g++ \ && rm -rf /var/lib/apt/lists/* # 复制依赖文件 COPY requirements.txt . # 安装Python依赖 RUN pip install --no-cache-dir -r requirements.txt # 复制应用代码 COPY . . # 创建非root用户 RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app USER appuser # 暴露端口 EXPOSE 8080 # 启动命令 CMD ["gunicorn", "app.main:app", \ "--workers", "4", \ "--worker-class", "uvicorn.workers.UvicornWorker", \ "--bind", "0.0.0.0:8080", \ "--timeout", "300", \ "--access-logfile", "-", \ "--error-logfile", "-", \ "--log-level", "info"] 

创建docker-compose.yml

version: '3.8' services: qwen-api: build: . ports: - "8080:8080" environment: - QWEN_BASE_URL=http://qwen-backend:8000 - API_KEYS=${API_KEYS} - ALLOWED_ORIGINS=${ALLOWED_ORIGINS} volumes: - ./logs:/app/logs restart: unless-stopped networks: - qwen-network qwen-backend: image: qwen35awq-backend:latest ports: - "8000:8000" deploy: resources: reservations: devices: - driver: nvidia count: 2 capabilities: [gpu] restart: unless-stopped networks: - qwen-network networks: qwen-network: driver: bridge 

9.3 配置Nginx反向代理

创建Nginx配置/etc/nginx/sites-available/qwen-api

server { listen 80; server_name api.yourdomain.com; # 重定向到HTTPS return 301 https://$server_name$request_uri; } server { listen 443 ssl http2; server_name api.yourdomain.com; # SSL证书 ssl_certificate /etc/letsencrypt/live/api.yourdomain.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/api.yourdomain.com/privkey.pem; # SSL配置 ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512; ssl_prefer_server_ciphers off; # 客户端超时设置 client_max_body_size 10M; client_body_timeout 300s; send_timeout 300s; # API接口 location /api/ { proxy_pass http://localhost:8080; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_cache_bypass $http_upgrade; # 增加超时时间 proxy_connect_timeout 300s; proxy_send_timeout 300s; proxy_read_timeout 300s; } # Swagger文档 location /docs { proxy_pass http://localhost:8080; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } # 健康检查 location /health { proxy_pass http://localhost:8080; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } # 静态文件 location /static/ { alias /app/static/; expires 1y; add_header Cache-Control "public, immutable"; } } 

9.4 监控和日志

配置日志轮转/etc/logrotate.d/qwen-api

/app/logs/*.log { daily rotate 30 compress delaycompress missingok notifempty create 644 appuser appuser sharedscripts postrotate kill -USR1 $(cat /tmp/gunicorn.pid 2>/dev/null) 2>/dev/null || true endscript } 

10. 总结

通过本文的实践,我们成功将Qwen3.5-35B-A3B-AWQ-4bit多模态模型封装成了企业级的API服务。现在你拥有:

  1. 完整的RESTful API:支持图文对话、图片上传、模型查询等功能
  2. 自动生成的Swagger文档:让API使用和测试变得简单直观
  3. 安全的API密钥鉴权:保护你的服务不被滥用
  4. 流式响应支持:适合需要实时反馈的应用场景
  5. 生产级部署方案:包括Docker容器化、Nginx配置、监控日志

这套方案的优势在于:

  • 标准化接口:遵循OpenAI兼容的API设计,方便集成
  • 开箱即用:提供了完整的代码和配置,快速部署
  • 灵活扩展:模块化设计,方便添加新功能
  • 安全可靠:包含完整的鉴权和错误处理机制

无论你是想为内部业务系统添加AI能力,还是构建面向客户的AI产品,这套API封装方案都能为你提供坚实的基础。现在,你可以专注于业务逻辑的开发,而不用操心AI模型的部署和调用细节。


获取更多AI镜像

想探索更多AI镜像和应用场景?访问 ZEEKLOG星图镜像广场,提供丰富的预置镜像,覆盖大模型推理、图像生成、视频生成、模型微调等多个领域,支持一键部署。

Read more

Vue入门到精通:从零开始学Vue

Vue入门到精通:从零开始学Vue

目录 一、第一个Vue程序 第一步 Vue构造函数的参数:options template配置项 第二步 模板语句的数据来源 Template配置项 Vue实例和容器 二、Vue模板语法 Vue 插值 Vue 指令 v-bind指令 v-model指令 三、MVVM分层思想 四、VM defineProperty 五、数据代理机制 Vue数据代理机制对属性名的要求 手写Vue框架数据代理的实现 六、解读Vue框架源代码 data(函数) 七、Vue事件处理 事件绑定 Vue事件绑定 事件回调函数中的this methods实现原理 八、事件修饰符 按键修饰符 九、计算属性 反转字符串methods实现 反转字符串计算属性实现 计算属性用法 十、侦听属性 比较大小的案例watch实现 computed实现

基于C++11手撸前端Promise

基于C++11手撸前端Promise

文章导航 * 引言 * 前端Promise的应用与优势 * 常见应用场景 * 并发请求 * Promise 解决的问题 * 手写 C++ Promise 实现 * 类结构与成员变量 * 构造函数 * resolve 方法 * reject 方法 * then 方法 * onCatch 方法 * 链式调用 * 使用示例 * `std::promise` 与 `CProimse` 对比 * 1. 基础功能对比 * 2. 实现细节对比 * (1) 状态管理 * (2) 回调注册与执行 * (3) 异步支持 * (4) 链式调用 * 3. 代码示例对比 * (1) `CProimse` 示例 * (2) `std::promise` 示例 * 4.

Android WebView 版本升级方案详解

Android WebView 版本升级方案详解 目录 1. 问题背景 2. WebViewUpgrade 项目介绍 3. 升级方法详解 4. 替代方案对比 5. 接入与使用步骤 6. 注意事项与限制 7. 总结与建议 问题背景 WebView 版本差异带来的问题 Android 5.0 以后,WebView 升级需要去 Google Play 安装 APK,但即使安装了也不一定能正常工作。像华为、Amazon 等特殊机型的 WebView 的 Chromium 版本一般比较低,只能使用它自己的 WebView,无法使用 Google 的 WebView。 典型问题场景 H.265 视频播放问题: