「实战指南」使用 Python 调用大模型（LLM）

# 请求体结构 data = { "model": "/models/Qwen2___5-32B-Instruct-AWQ", "messages": [ {"role": "user", "content": "分析当前全球经济形势，并提出你的见解"} ], "max_tokens": 2048, "temperature": 0.7, "top_k": 1, "top_p": 0.75, }

1.3 优势和适用场景

**优势：**

- 灵活性最高，可以完全自定义请求参数

- 不依赖特定SDK，减少依赖

- 适合需要精确控制请求细节的场景

**适用场景：**

- 自定义LLM服务调用

- 需要特殊请求头或认证方式

- 对性能要求较高的生产环境

1.4 完整代码

import requests import json # 本地服务的 API 端点 url = "http://127.0.0.1:6790/v1/chat/completions" # 请求头 headers = { "Content-Type": "application/json", #"Authorization": "Bearer your_api_key" # 如果需要的话 } # 请求体 data = { "model": "/models/Qwen2___5-32B-Instruct-AWQ", "messages": [ {"role": "user", "content": "分析当前全球经济形势，并提出你的见解"} ], "max_tokens": 2048, "temperature": 0.7, "top_k": 1, "top_p": 0.75, } # 发送 POST 请求 response = requests.post(url, headers=headers, data=json.dumps(data)) # 解析并打印回复内容 if response.status_code == 200: response_data = response.json() print(response_data['choices'][0]['message']['content']) else: print(f"请求失败，状态码：{response.status_code}") print(response.text)

2. 封装式API调用

2.1 核心特点

- 将API调用逻辑封装成函数

- 支持多种模型选择（qwen2.5_32b_awq, qwen2.5_7b_awq）

- 使用completion API而非chat completion API

2.2 关键代码分析

def llm_inference(prompt_list: list, model_name: str): # 根据模型名称选择不同的服务器配置 if model_name == "qwen2.5_32b_awq": llm_server = {"server_url": "http://127.0.0.1:6790/v1/completions", "path": "/psd/models/Qwen2___5-32B-Instruct-AWQ"}

2.3 优势和适用场景

**优势：**

- 代码复用性好，便于维护

- 支持多模型切换

- 统一的错误处理机制

**适用场景：**

- 需要频繁切换不同模型的场景

- 批量处理多个请求

- 作为其他项目的依赖模块

2.4 完整代码

# -*- coding: utf-8 -*- # @Author : yuan # @Time : 2025/7/22 5:20 import requests import json def llm_inference(prompt_list: list, model_name: str): if model_name == "qwen2.5_32b_awq": llm_server = {"server_url": "http://127.0.0.1:6790/v1/completions", "path": "/models/Qwen2___5-32B-Instruct-AWQ"} elif model_name == "qwen2.5_7b_awq": llm_server = {"server_url": "http://127.0.0.1:6791/v1/completions", "path": "/models/Qwen2___5-7B-Instruct-AWQ"} else: llm_server = {"server_url": "http://127.0.0.1:6790/v1/completions", "path": "/models/Qwen2___5-32B-Instruct-AWQ"} # # prompt = f"<|im_start|>system\n{system_text}<|im_end|>\n<|im_start|>user\n{query_text}<|im_end|><|im_start|>assistant\n" rewrite_server_url = llm_server["server_url"] rewrite_server_headers = { 'Content-Type': 'application/json' } rewrite_server_data = { 'model': llm_server["path"], 'prompt': prompt_list, 'max_tokens': 4096, # 生成长度 'top_k': 1, 'top_p': 0.75, 'temperature': 0, 'stop': ["<|im_end|>"] } response = requests.post(rewrite_server_url, headers=rewrite_server_headers, data=json.dumps(rewrite_server_data)) # return response.json()['choices'][0]['text'] return response.json() if __name__ == "__main__": prompt_list = [f"<|im_start|>system\n{'你是围城智能机器人'}<|im_end|>\n<|im_start|>user\n{'你是谁'}<|im_end|><|im_start|>assistant\n"] answer = llm_inference(prompt_list, "qwen2.5_32b_awq") # print(answer) for i in range(len(prompt_list)): print(answer['choices'][i]['text'])

3. OpenAI SDK方式

3.1 核心特点

- 使用官方的OpenAI Python SDK

- 调用第三方API服务（chataiapi.com）

- 支持多种模型（GPT-4o, Gemini-2.5-pro等）

3.2 关键代码分析

client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) response = client.chat.completions.create( model=model_name, messages=messages, temperature=0.0 )

3.3 优势和适用场景

**优势：**

- 使用官方SDK，稳定性高

- 自动处理认证和请求格式

- 支持流式响应等高级功能

**适用场景：**

- 调用OpenAI官方或兼容API

- 需要SDK提供的便利功能

- 快速原型开发

3.4 完整代码

#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ 极简化的API请求测试脚本 只保留发送请求的核心功能 """ from openai import OpenAI def send_test_request(client, model_name, test_message="im testing, response me latter 'test' only"): """ 发送测试请求 Args: client: OpenAI客户端 model_name: 模型名称 test_message: 测试消息 Returns: str: 响应内容 """ # messages = [ # {"role": "user", "content": [ # {"type": "text", "text": test_message} # ]} # ] # 构建消息列表 messages = [{"role": "user", "content": "你叫什么名字！"}] response = client.chat.completions.create( model=model_name, messages=messages, temperature=0.0 ) print(f"Response: {response}") return response.choices[0].message.content if __name__ == "__main__": # API配置 openai_api_key = "sk-WVoD66MR7b" openai_api_base = "https://www.api.com/v1" model_name = "gpt-4o" # gemini-2.5-pro gpt-4o Claude 3.5 Sonnet o3-mini # 初始化客户端 client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) test_message = "你好" # 发送测试请求 try: response_content = send_test_request(client, model_name, test_message) print(f"Success! Response content: {response_content}") except Exception as e: print(f"Error: {e}")

4. 传统OpenAI库方式

4.1 核心特点

- 使用传统的`openai`库（非SDK）

- 支持本地和远程API调用

- 简洁的API调用方式

4.2 关键代码分析

openai.api_key = "sk-WVoD66MR" openai.api_base = "https://www.api.com/v1" response = openai.ChatCompletion.create( model=model_name, messages=[{"role": "user", "content": "你叫什么名字！"}], max_tokens=512, temperature=0.7 )

4.3 优势和适用场景

**优势：**

- 代码简洁，易于理解

- 兼容性好，支持多种API服务

- 学习成本低

**适用场景：**

- 快速测试和验证

- 教学和演示

- 简单的集成需求

4.4 完整代码

import openai # 设置 API 密钥（如果本地服务需要） openai.api_key = "sk-WVoD66MR7bx" # 设置 API 基础 URL 为本地模型服务的地址 # openai.api_base = "http://127.0.0.1:6790/v1" openai.api_base = "https://www.chataiapi.com/v1" # 指定要使用的模型名称 # model_name = "/models/Qwen2___5-32B-Instruct-AWQ" model_name = "gemini-2.5-pro" # 创建聊天完成请求 response = openai.ChatCompletion.create( model=model_name, messages=[{"role": "user", "content": "你叫什么名字！"}], max_tokens=512, temperature=0.7 ) print("response:",response) # 打印生成的回复内容 print(response['choices'][0]['message']['content'])

5. 对比分析

|------|----------------|------------------|---------------|---------------------|

| 灵活性 | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |

| 易用性 | ⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |

| 可维护性 | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |

| 功能丰富度 | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |

| 学习成本 | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |

总结

这四种调用方式各有特色，适用于不同的开发场景。开发者应该根据具体需求选择合适的方式：

- **追求灵活性**：选择原生HTTP请求

- **追求可维护性**：选择封装函数

- **追求功能丰富**：选择OpenAI SDK

- **追求简单易用**：选择传统OpenAI库

无论选择哪种方式，都要注意API密钥的安全性和错误处理的完整性。在实际项目中，建议根据项目规模和团队技术栈做出合理选择。

「实战指南」使用 Python 调用大模型（LLM）

Ne0inhk

前言

1. 原生HTTP请求方式

1.1 核心特点

1.2 关键代码分析