Qwen3.5-35B-A3B-AWQ-4bit多模态API封装:FastAPI接口+Swagger文档+鉴权机制
Qwen3.5-35B-A3B-AWQ-4bit多模态API封装:FastAPI接口+Swagger文档+鉴权机制
1. 引言:从Web界面到企业级API
如果你已经体验过Qwen3.5-35B-A3B-AWQ-4bit的Web界面,知道它能看懂图片、回答图文问题,那么你可能会想:这个能力能不能集成到我的业务系统里?比如,电商平台自动分析商品主图,教育应用智能批改作业图片,或者内容平台自动给图片生成描述。
直接调用Web界面显然不行——没有标准的API接口,没有文档,没有安全控制。这就是我们今天要解决的问题:把强大的多模态模型封装成企业级的API服务。
本文将带你一步步实现:
- 用FastAPI搭建RESTful API接口
- 集成Swagger UI自动生成交互式文档
- 添加API密钥鉴权保护你的服务
- 处理图片上传和流式响应
无论你是想为内部团队提供AI能力,还是构建面向客户的AI产品,这套方案都能让你快速落地。
2. 环境准备与项目结构
2.1 基础环境要求
在开始之前,确保你的环境满足以下要求:
- Python 3.8+:推荐Python 3.9或3.10
- CUDA 11.8+:GPU推理必备
- 至少24GB GPU内存:双卡环境,每卡12GB以上
- 已部署Qwen3.5-35B-A3B-AWQ-4bit服务:后端在8000端口运行
2.2 创建项目目录
我们先创建一个清晰的项目结构:
qwen-api-service/ ├── app/ │ ├── __init__.py │ ├── main.py # FastAPI应用入口 │ ├── api/ │ │ ├── __init__.py │ │ ├── endpoints.py # API路由定义 │ │ └── dependencies.py # 依赖项(如鉴权) │ ├── core/ │ │ ├── __init__.py │ │ ├── config.py # 配置管理 │ │ └── security.py # 安全相关 │ ├── models/ │ │ ├── __init__.py │ │ └── schemas.py # Pydantic数据模型 │ └── services/ │ ├── __init__.py │ └── qwen_client.py # Qwen服务客户端 ├── requirements.txt # 依赖包列表 ├── .env.example # 环境变量示例 └── README.md # 项目说明 2.3 安装依赖包
创建requirements.txt文件:
fastapi==0.104.1 uvicorn[standard]==0.24.0 python-multipart==0.0.6 python-jose[cryptography]==3.3.0 passlib[bcrypt]==1.7.4 pydantic==2.5.0 pydantic-settings==2.1.0 httpx==0.25.1 pillow==10.1.0 python-dotenv==1.0.0 安装依赖:
pip install -r requirements.txt 3. FastAPI基础框架搭建
3.1 创建FastAPI应用
在app/main.py中创建主应用:
from fastapi import FastAPI from fastapi.middleware.cors import CORSMiddleware from app.api.endpoints import router as api_router from app.core.config import settings # 创建FastAPI应用实例 app = FastAPI( title="Qwen3.5多模态API服务", description="基于Qwen3.5-35B-A3B-AWQ-4bit的多模态API接口,支持图片理解、图文问答等功能", version="1.0.0", docs_url="/docs", # Swagger UI地址 redoc_url="/redoc", # ReDoc地址 ) # 配置CORS(跨域资源共享) app.add_middleware( CORSMiddleware, allow_origins=settings.ALLOWED_ORIGINS, # 允许的域名 allow_credentials=True, allow_methods=["*"], # 允许所有HTTP方法 allow_headers=["*"], # 允许所有请求头 ) # 包含API路由 app.include_router(api_router, prefix="/api/v1") # 健康检查端点 @app.get("/health") async def health_check(): """服务健康检查""" return { "status": "healthy", "service": "qwen-multimodal-api", "version": "1.0.0" } # 根路径重定向到文档 @app.get("/") async def root(): """根路径,重定向到API文档""" from fastapi.responses import RedirectResponse return RedirectResponse(url="/docs") 3.2 配置管理
在app/core/config.py中管理配置:
from pydantic_settings import BaseSettings from typing import List class Settings(BaseSettings): """应用配置""" # API配置 API_V1_STR: str = "/api/v1" PROJECT_NAME: str = "Qwen3.5多模态API服务" # CORS配置 ALLOWED_ORIGINS: List[str] = [ "http://localhost:3000", "http://127.0.0.1:3000", # 添加你的前端域名 ] # Qwen服务配置 QWEN_BASE_URL: str = "http://127.0.0.1:8000" QWEN_TIMEOUT: int = 300 # 5分钟超时 # 安全配置 API_KEY_HEADER: str = "X-API-Key" API_KEYS: List[str] = [] # 有效的API密钥列表 # 文件上传配置 MAX_UPLOAD_SIZE: int = 10 * 1024 * 1024 # 10MB ALLOWED_IMAGE_TYPES: List[str] = ["image/jpeg", "image/png", "image/webp"] class Config: env_file = ".env" case_sensitive = True settings = Settings() 创建.env文件配置环境变量:
# Qwen服务地址 QWEN_BASE_URL=http://127.0.0.1:8000 # API密钥(用逗号分隔多个密钥) API_KEYS=your_api_key_1,your_api_key_2,test_key_123 # 允许的前端域名 ALLOWED_ORIGINS=http://localhost:3000,http://127.0.0.1:3000 4. 实现API密钥鉴权
4.1 创建鉴权依赖
在app/api/dependencies.py中实现API密钥验证:
from fastapi import Header, HTTPException, status from typing import Optional from app.core.config import settings async def verify_api_key( x_api_key: Optional[str] = Header(None, alias=settings.API_KEY_HEADER) ) -> str: """ 验证API密钥 Args: x_api_key: 请求头中的API密钥 Returns: 验证通过的API密钥 Raises: HTTPException: 如果密钥无效或缺失 """ if not settings.API_KEYS: # 如果没有配置API密钥,允许所有请求(仅用于测试) return "no_auth_required" if not x_api_key: raise HTTPException( status_code=status.HTTP_401_UNAUTHORIZED, detail="API密钥缺失", headers={"WWW-Authenticate": "API-Key"}, ) if x_api_key not in settings.API_KEYS: raise HTTPException( status_code=status.HTTP_403_FORBIDDEN, detail="无效的API密钥", ) return x_api_key 4.2 安全工具函数
在app/core/security.py中添加安全相关功能:
import secrets from typing import List def generate_api_key(length: int = 32) -> str: """ 生成安全的API密钥 Args: length: 密钥长度 Returns: 生成的API密钥 """ return secrets.token_urlsafe(length) def validate_api_key(api_key: str, valid_keys: List[str]) -> bool: """ 验证API密钥是否有效 Args: api_key: 待验证的密钥 valid_keys: 有效的密钥列表 Returns: 是否有效 """ return api_key in valid_keys def mask_api_key(api_key: str) -> str: """ 掩码显示API密钥(用于日志) Args: api_key: 原始API密钥 Returns: 掩码后的密钥 """ if len(api_key) <= 8: return "***" return f"{api_key[:4]}...{api_key[-4:]}" 5. 定义数据模型
5.1 请求和响应模型
在app/models/schemas.py中定义数据模型:
from pydantic import BaseModel, Field from typing import Optional, List, Any from enum import Enum class ImageInputType(str, Enum): """图片输入类型""" URL = "url" BASE64 = "base64" UPLOAD = "upload" class ChatMessage(BaseModel): """聊天消息""" role: str = Field(..., description="消息角色:user或assistant") content: str = Field(..., description="消息内容") class Config: json_schema_extra = { "example": { "role": "user", "content": "这张图片里有什么?" } } class ImageInfo(BaseModel): """图片信息""" type: ImageInputType = Field(..., description="图片类型:url/base64/upload") data: str = Field(..., description="图片数据:URL、base64字符串或上传的文件ID") class Config: json_schema_extra = { "example": { "type": "url", "data": "https://example.com/image.jpg" } } class ChatRequest(BaseModel): """聊天请求""" messages: List[ChatMessage] = Field(..., description="消息历史") image: Optional[ImageInfo] = Field(None, description="图片信息") max_tokens: int = Field(1024, description="最大生成token数") temperature: float = Field(0.7, description="温度参数,控制随机性") stream: bool = Field(False, description="是否流式响应") class Config: json_schema_extra = { "example": { "messages": [ {"role": "user", "content": "描述这张图片"} ], "image": { "type": "url", "data": "https://example.com/image.jpg" }, "max_tokens": 1024, "temperature": 0.7, "stream": False } } class ChatResponse(BaseModel): """聊天响应""" id: str = Field(..., description="响应ID") object: str = Field("chat.completion", description="对象类型") created: int = Field(..., description="创建时间戳") model: str = Field(..., description="模型名称") choices: List[Any] = Field(..., description="选择列表") usage: Optional[Any] = Field(None, description="使用统计") class Config: json_schema_extra = { "example": { "id": "chatcmpl-123", "object": "chat.completion", "created": 1677652288, "model": "qwen3.5-35b-awq", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "这是一张..." }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 56, "completion_tokens": 31, "total_tokens": 87 } } } class ErrorResponse(BaseModel): """错误响应""" error: str = Field(..., description="错误信息") code: int = Field(..., description="错误代码") detail: Optional[str] = Field(None, description="错误详情") class Config: json_schema_extra = { "example": { "error": "Invalid API key", "code": 403, "detail": "提供的API密钥无效" } } 6. 实现Qwen服务客户端
6.1 创建HTTP客户端
在app/services/qwen_client.py中实现与Qwen后端的通信:
import httpx import base64 import mimetypes from typing import Optional, Dict, Any, AsyncGenerator from pathlib import Path import asyncio from app.core.config import settings import logging logger = logging.getLogger(__name__) class QwenClient: """Qwen服务客户端""" def __init__(self): self.base_url = settings.QWEN_BASE_URL self.timeout = settings.QWEN_TIMEOUT self.client = httpx.AsyncClient( timeout=httpx.Timeout(self.timeout), limits=httpx.Limits(max_connections=100, max_keepalive_connections=20) ) async def close(self): """关闭客户端连接""" await self.client.aclose() async def _prepare_image_data(self, image_info: Dict[str, Any]) -> Optional[str]: """ 准备图片数据 Args: image_info: 图片信息 Returns: base64编码的图片数据 """ image_type = image_info.get("type") image_data = image_info.get("data") if not image_type or not image_data: return None if image_type == "url": # 下载图片并转换为base64 try: async with httpx.AsyncClient() as client: response = await client.get(image_data) response.raise_for_status() # 获取图片类型 content_type = response.headers.get("content-type", "image/jpeg") # 编码为base64 base64_data = base64.b64encode(response.content).decode("utf-8") return f"data:{content_type};base64,{base64_data}" except Exception as e: logger.error(f"下载图片失败: {e}") return None elif image_type == "base64": # 如果是base64数据,直接返回 if image_data.startswith("data:"): return image_data else: # 假设是纯base64,添加默认类型 return f"data:image/jpeg;base64,{image_data}" elif image_type == "upload": # 处理上传的文件(需要先保存到临时位置) # 这里简化处理,实际需要根据你的文件存储方案实现 return None return None async def chat_completion( self, messages: list, image: Optional[Dict[str, Any]] = None, max_tokens: int = 1024, temperature: float = 0.7, stream: bool = False ) -> Dict[str, Any]: """ 调用Qwen聊天补全接口 Args: messages: 消息列表 image: 图片信息 max_tokens: 最大token数 temperature: 温度参数 stream: 是否流式响应 Returns: 响应数据 """ # 准备请求数据 request_data = { "messages": messages, "max_tokens": max_tokens, "temperature": temperature, "stream": stream } # 如果有图片,添加到请求中 if image: image_data = await self._prepare_image_data(image) if image_data: # 将图片作为特殊消息添加到消息列表 image_message = { "role": "user", "content": [ {"type": "text", "text": messages[-1]["content"] if messages else ""}, {"type": "image_url", "image_url": {"url": image_data}} ] } # 替换最后一个用户消息(如果有的话) if messages and messages[-1]["role"] == "user": messages[-1] = image_message else: messages.append(image_message) try: # 调用Qwen服务 response = await self.client.post( f"{self.base_url}/v1/chat/completions", json=request_data, headers={"Content-Type": "application/json"} ) response.raise_for_status() return response.json() except httpx.HTTPStatusError as e: logger.error(f"HTTP错误: {e.response.status_code} - {e.response.text}") raise except Exception as e: logger.error(f"调用Qwen服务失败: {e}") raise async def chat_completion_stream( self, messages: list, image: Optional[Dict[str, Any]] = None, max_tokens: int = 1024, temperature: float = 0.7 ) -> AsyncGenerator[str, None]: """ 流式调用Qwen聊天补全接口 Args: messages: 消息列表 image: 图片信息 max_tokens: 最大token数 temperature: 温度参数 Yields: SSE格式的数据块 """ # 准备请求数据(与普通请求相同) request_data = { "messages": messages, "max_tokens": max_tokens, "temperature": temperature, "stream": True } # 处理图片(与普通请求相同) if image: image_data = await self._prepare_image_data(image) if image_data: image_message = { "role": "user", "content": [ {"type": "text", "text": messages[-1]["content"] if messages else ""}, {"type": "image_url", "image_url": {"url": image_data}} ] } if messages and messages[-1]["role"] == "user": messages[-1] = image_message else: messages.append(image_message) try: # 发起流式请求 async with self.client.stream( "POST", f"{self.base_url}/v1/chat/completions", json=request_data, headers={"Content-Type": "application/json"} ) as response: response.raise_for_status() # 流式读取响应 async for chunk in response.aiter_bytes(): if chunk: yield chunk.decode("utf-8") except Exception as e: logger.error(f"流式调用失败: {e}") yield f"data: {json.dumps({'error': str(e)})}\n\n" # 创建全局客户端实例 qwen_client = QwenClient() 7. 实现API端点
7.1 定义API路由
在app/api/endpoints.py中实现所有API端点:
from fastapi import APIRouter, Depends, HTTPException, status, UploadFile, File, Form from fastapi.responses import StreamingResponse from typing import Optional, List import json import time import uuid from app.api.dependencies import verify_api_key from app.models.schemas import ( ChatRequest, ChatResponse, ErrorResponse, ImageInputType, ChatMessage, ImageInfo ) from app.services.qwen_client import qwen_client from app.core.config import settings router = APIRouter() @router.post( "/chat/completions", response_model=ChatResponse, responses={ 400: {"model": ErrorResponse}, 401: {"model": ErrorResponse}, 500: {"model": ErrorResponse}, }, summary="聊天补全", description="调用Qwen3.5模型进行图文对话" ) async def chat_completion( request: ChatRequest, api_key: str = Depends(verify_api_key) ): """ 聊天补全接口 - **messages**: 消息历史 - **image**: 图片信息(可选) - **max_tokens**: 最大生成token数 - **temperature**: 温度参数 - **stream**: 是否流式响应 返回模型的回答 """ try: # 调用Qwen服务 response_data = await qwen_client.chat_completion( messages=[msg.dict() for msg in request.messages], image=request.image.dict() if request.image else None, max_tokens=request.max_tokens, temperature=request.temperature, stream=request.stream ) # 添加响应ID和时间戳 response_data["id"] = f"chatcmpl-{uuid.uuid4().hex[:16]}" response_data["created"] = int(time.time()) response_data["model"] = "qwen3.5-35b-awq-4bit" return response_data except Exception as e: raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f"服务调用失败: {str(e)}" ) @router.post( "/chat/completions/stream", summary="流式聊天补全", description="流式调用Qwen3.5模型进行图文对话" ) async def chat_completion_stream( request: ChatRequest, api_key: str = Depends(verify_api_key) ): """ 流式聊天补全接口 返回Server-Sent Events (SSE)格式的流式响应 """ async def event_generator(): try: # 生成流式响应 async for chunk in qwen_client.chat_completion_stream( messages=[msg.dict() for msg in request.messages], image=request.image.dict() if request.image else None, max_tokens=request.max_tokens, temperature=request.temperature ): yield chunk except Exception as e: error_data = json.dumps({ "error": str(e), "code": 500 }) yield f"data: {error_data}\n\n" return StreamingResponse( event_generator(), media_type="text/event-stream", headers={ "Cache-Control": "no-cache", "Connection": "keep-alive", "X-Accel-Buffering": "no" # 禁用Nginx缓冲 } ) @router.post( "/images/upload", summary="上传图片", description="上传图片并返回文件ID,用于后续的图文对话" ) async def upload_image( file: UploadFile = File(..., description="图片文件"), api_key: str = Depends(verify_api_key) ): """ 上传图片接口 - **file**: 图片文件(支持jpg、png、webp格式) 返回文件ID和URL """ # 检查文件类型 if file.content_type not in settings.ALLOWED_IMAGE_TYPES: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=f"不支持的文件类型。支持的类型: {', '.join(settings.ALLOWED_IMAGE_TYPES)}" ) # 检查文件大小 content = await file.read() if len(content) > settings.MAX_UPLOAD_SIZE: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=f"文件大小超过限制(最大{settings.MAX_UPLOAD_SIZE // 1024 // 1024}MB)" ) # 在实际应用中,这里应该将文件保存到存储服务(如S3、MinIO等) # 并返回文件的访问URL # 这里简化处理,直接返回base64数据 import base64 base64_data = base64.b64encode(content).decode("utf-8") file_id = str(uuid.uuid4()) # 在实际应用中,应该保存文件并返回URL # 这里返回base64数据供测试使用 return { "file_id": file_id, "url": f"data:{file.content_type};base64,{base64_data}", "filename": file.filename, "content_type": file.content_type, "size": len(content) } @router.get( "/models", summary="获取模型列表", description="获取可用的模型列表" ) async def list_models( api_key: str = Depends(verify_api_key) ): """ 获取模型列表接口 返回当前可用的模型信息 """ return { "object": "list", "data": [ { "id": "qwen3.5-35b-awq-4bit", "object": "model", "created": 1677610602, "owned_by": "qwen", "permission": [], "root": "qwen3.5-35b-awq-4bit", "parent": None } ] } @router.get( "/models/{model_id}", summary="获取模型信息", description="获取指定模型的详细信息" ) async def retrieve_model( model_id: str, api_key: str = Depends(verify_api_key) ): """ 获取模型信息接口 - **model_id**: 模型ID 返回模型详细信息 """ if model_id != "qwen3.5-35b-awq-4bit": raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=f"模型 '{model_id}' 不存在" ) return { "id": "qwen3.5-35b-awq-4bit", "object": "model", "created": 1677610602, "owned_by": "qwen", "permission": [], "root": "qwen3.5-35b-awq-4bit", "parent": None, "description": "Qwen3.5-35B-A3B-AWQ-4bit多模态模型,支持图片理解和图文对话" } 8. 启动和测试API服务
8.1 创建启动脚本
创建run.py作为启动脚本:
import uvicorn from app.main import app from app.core.config import settings import logging # 配置日志 logging.basicConfig( level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", handlers=[ logging.StreamHandler(), logging.FileHandler("api.log") ] ) if __name__ == "__main__": uvicorn.run( "app.main:app", host="0.0.0.0", port=8080, # API服务端口 reload=True, # 开发模式热重载 log_level="info" ) 8.2 使用Swagger UI测试
启动服务后,访问 http://localhost:8080/docs 可以看到自动生成的Swagger文档:
- 查看所有API端点:页面左侧显示所有可用的API
- 测试API接口:点击任意端点,点击"Try it out"按钮
- 填写请求参数:在请求体中填写JSON数据
- 执行请求:点击"Execute"按钮发送请求
- 查看响应:在下方查看服务器返回的结果
8.3 使用curl测试API
测试聊天接口:
# 测试普通聊天(无图片) curl -X POST "http://localhost:8080/api/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "X-API-Key: your_api_key_1" \ -d '{ "messages": [ {"role": "user", "content": "你好,请介绍一下你自己"} ], "max_tokens": 100, "temperature": 0.7 }' # 测试图片上传 curl -X POST "http://localhost:8080/api/v1/images/upload" \ -H "X-API-Key: your_api_key_1" \ -F "file=@/path/to/your/image.jpg" # 测试图文对话(使用图片URL) curl -X POST "http://localhost:8080/api/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "X-API-Key: your_api_key_1" \ -d '{ "messages": [ {"role": "user", "content": "描述这张图片里的内容"} ], "image": { "type": "url", "data": "https://example.com/image.jpg" }, "max_tokens": 200, "temperature": 0.7 }' 8.4 使用Python客户端测试
创建测试脚本test_client.py:
import asyncio import aiohttp import base64 from pathlib import Path async def test_chat_completion(): """测试聊天补全接口""" url = "http://localhost:8080/api/v1/chat/completions" headers = { "Content-Type": "application/json", "X-API-Key": "your_api_key_1" } # 测试文本对话 data = { "messages": [ {"role": "user", "content": "你好,请用中文回答"} ], "max_tokens": 100, "temperature": 0.7 } async with aiohttp.ClientSession() as session: async with session.post(url, json=data, headers=headers) as response: result = await response.json() print("文本对话结果:", result) async def test_image_upload(): """测试图片上传接口""" url = "http://localhost:8080/api/v1/images/upload" headers = { "X-API-Key": "your_api_key_1" } # 读取本地图片 image_path = Path("test_image.jpg") if not image_path.exists(): print("测试图片不存在") return data = aiohttp.FormData() data.add_field('file', open(image_path, 'rb'), filename=image_path.name, content_type='image/jpeg') async with aiohttp.ClientSession() as session: async with session.post(url, data=data, headers=headers) as response: result = await response.json() print("图片上传结果:", result) # 使用上传的图片进行对话 if 'url' in result: await test_image_chat(result['url']) async def test_image_chat(image_url: str): """测试图文对话""" url = "http://localhost:8080/api/v1/chat/completions" headers = { "Content-Type": "application/json", "X-API-Key": "your_api_key_1" } data = { "messages": [ {"role": "user", "content": "描述这张图片里的内容"} ], "image": { "type": "url", "data": image_url }, "max_tokens": 200, "temperature": 0.7 } async with aiohttp.ClientSession() as session: async with session.post(url, json=data, headers=headers) as response: result = await response.json() print("图文对话结果:", result) async def main(): """运行所有测试""" print("开始测试API接口...") # 测试文本对话 print("\n1. 测试文本对话...") await test_chat_completion() # 测试图片上传和图文对话 print("\n2. 测试图片上传和图文对话...") await test_image_upload() print("\n测试完成!") if __name__ == "__main__": asyncio.run(main()) 9. 生产环境部署建议
9.1 使用Gunicorn部署
对于生产环境,建议使用Gunicorn + Uvicorn:
# 安装Gunicorn pip install gunicorn # 使用Gunicorn启动(4个工作进程) gunicorn app.main:app \ --workers 4 \ --worker-class uvicorn.workers.UvicornWorker \ --bind 0.0.0.0:8080 \ --timeout 300 \ --access-logfile access.log \ --error-logfile error.log \ --log-level info 9.2 使用Docker容器化
创建Dockerfile:
FROM python:3.9-slim WORKDIR /app # 安装系统依赖 RUN apt-get update && apt-get install -y \ gcc \ g++ \ && rm -rf /var/lib/apt/lists/* # 复制依赖文件 COPY requirements.txt . # 安装Python依赖 RUN pip install --no-cache-dir -r requirements.txt # 复制应用代码 COPY . . # 创建非root用户 RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app USER appuser # 暴露端口 EXPOSE 8080 # 启动命令 CMD ["gunicorn", "app.main:app", \ "--workers", "4", \ "--worker-class", "uvicorn.workers.UvicornWorker", \ "--bind", "0.0.0.0:8080", \ "--timeout", "300", \ "--access-logfile", "-", \ "--error-logfile", "-", \ "--log-level", "info"] 创建docker-compose.yml:
version: '3.8' services: qwen-api: build: . ports: - "8080:8080" environment: - QWEN_BASE_URL=http://qwen-backend:8000 - API_KEYS=${API_KEYS} - ALLOWED_ORIGINS=${ALLOWED_ORIGINS} volumes: - ./logs:/app/logs restart: unless-stopped networks: - qwen-network qwen-backend: image: qwen35awq-backend:latest ports: - "8000:8000" deploy: resources: reservations: devices: - driver: nvidia count: 2 capabilities: [gpu] restart: unless-stopped networks: - qwen-network networks: qwen-network: driver: bridge 9.3 配置Nginx反向代理
创建Nginx配置/etc/nginx/sites-available/qwen-api:
server { listen 80; server_name api.yourdomain.com; # 重定向到HTTPS return 301 https://$server_name$request_uri; } server { listen 443 ssl http2; server_name api.yourdomain.com; # SSL证书 ssl_certificate /etc/letsencrypt/live/api.yourdomain.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/api.yourdomain.com/privkey.pem; # SSL配置 ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512; ssl_prefer_server_ciphers off; # 客户端超时设置 client_max_body_size 10M; client_body_timeout 300s; send_timeout 300s; # API接口 location /api/ { proxy_pass http://localhost:8080; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_cache_bypass $http_upgrade; # 增加超时时间 proxy_connect_timeout 300s; proxy_send_timeout 300s; proxy_read_timeout 300s; } # Swagger文档 location /docs { proxy_pass http://localhost:8080; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } # 健康检查 location /health { proxy_pass http://localhost:8080; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } # 静态文件 location /static/ { alias /app/static/; expires 1y; add_header Cache-Control "public, immutable"; } } 9.4 监控和日志
配置日志轮转/etc/logrotate.d/qwen-api:
/app/logs/*.log { daily rotate 30 compress delaycompress missingok notifempty create 644 appuser appuser sharedscripts postrotate kill -USR1 $(cat /tmp/gunicorn.pid 2>/dev/null) 2>/dev/null || true endscript } 10. 总结
通过本文的实践,我们成功将Qwen3.5-35B-A3B-AWQ-4bit多模态模型封装成了企业级的API服务。现在你拥有:
- 完整的RESTful API:支持图文对话、图片上传、模型查询等功能
- 自动生成的Swagger文档:让API使用和测试变得简单直观
- 安全的API密钥鉴权:保护你的服务不被滥用
- 流式响应支持:适合需要实时反馈的应用场景
- 生产级部署方案:包括Docker容器化、Nginx配置、监控日志
这套方案的优势在于:
- 标准化接口:遵循OpenAI兼容的API设计,方便集成
- 开箱即用:提供了完整的代码和配置,快速部署
- 灵活扩展:模块化设计,方便添加新功能
- 安全可靠:包含完整的鉴权和错误处理机制
无论你是想为内部业务系统添加AI能力,还是构建面向客户的AI产品,这套API封装方案都能为你提供坚实的基础。现在,你可以专注于业务逻辑的开发,而不用操心AI模型的部署和调用细节。
获取更多AI镜像
想探索更多AI镜像和应用场景?访问 ZEEKLOG星图镜像广场,提供丰富的预置镜像,覆盖大模型推理、图像生成、视频生成、模型微调等多个领域,支持一键部署。