Python 构建 AI 三工具：文档总结、代码生成与智能检索 | 极客日志

PythonAI算法

Python 构建 AI 三工具：文档总结、代码生成与智能检索

介绍使用 Python 构建三个 AI 工具的完整方案：智能文档总结器、AI 代码生成器和智能资料助手。通过封装 LLM 客户端，利用 DeepSeek/Qwen 等模型，实现 PDF/Word 解析、代码自动生成与多源信息检索。包含环境配置、核心代码实现、CLI 整合及云端部署指南，旨在提升开发效率与知识获取速度。

赛博朋克发布于 2026/4/6更新于 2026/7/2057 浏览

一、准备工作：环境与 API 配置

1.1 技术栈选择

技术组件	推荐方案	成本	说明
LLM 模型	DeepSeek / Qwen	免费/低价	国内模型，中文优秀
API 平台	硅基流动 / 魔搭社区	¥0.001/1k tokens	新用户有免费额度
文档解析	PyPDF2 / Unstructured	免费	支持 PDF/Word/Markdown
代码运行	Subprocess / Docker	免费	本地沙箱执行
搜索引擎	Bing Search API	付费（有免费层）	或用 DuckDuckGo 免费版

1.2 环境配置

# 创建虚拟环境
python -m venv ai-tools-env
source ai-tools-env/bin/activate
# Windows 用：ai-tools-env\Scripts\activate

# 安装依赖
pip install openai pypdf2 requests beautifulsoup4 python-dotenv
pip install aiohttp httpx # 异步请求支持

创建 .env 文件：

# API 配置
DEEPSEEK_API_KEY=your_deepseek_api_key
DEEPSEEK_BASE_URL=https://api.deepseek.com/v1
# 或使用硅基流动（支持多个模型）
SILICONFLOW_API_KEY=your_siliconflow_key
SILICONFLOW_BASE_URL=https://api.siliconflow.cn/v1
# 搜索 API（可选）
BING_SEARCH_API_KEY=your_bing_key

1.3 核心工具类封装

在开始之前，我们先封装一个统一的 LLM 调用类：

import os
import asyncio
from typing import List, Dict, Optional, AsyncGenerator
 dataclasses  dataclass
 openai  AsyncOpenAI
 dotenv  load_dotenv

load_dotenv()


 :
    
    role:   
    content: 

 :
    
     ():
        .api_key = api_key  os.getenv()
        .base_url = base_url  os.getenv()
        .model = model
        .temperature = temperature
        .client = AsyncOpenAI(api_key=.api_key, base_url=.base_url)

      () -> :
        
        response =  .client.chat.completions.create(
            model=.model,
            messages=[{: m.role, : m.content}  m  messages],
            temperature=kwargs.get(, .temperature),
            stream=stream,
            max_tokens=kwargs.get(, )
        )
         stream:
            full_content = 
              chunk  response:
                 chunk.choices[].delta.content:
                    content = chunk.choices[].delta.content
                    full_content += content
                    (content, end=, flush=)
             full_content
        :
             response.choices[].message.content

      () -> :
        
        response =  .client.chat.completions.create(
            model=.model,
            messages=[{: m.role, : m.content}  m  messages],
            tools=functions,
            tool_choice=
        )
         response.choices[].message


  ():
    llm = LLMClient()
    response =  llm.chat([Message(role=, content=)])
    (response)

 __name__ == :
    asyncio.run(test_llm())

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

import asyncio
from typing import List, Optional
from pathlib import Path
import PyPDF2
from bs4 import BeautifulSoup
import aiohttp
from dataclasses import dataclass
from datetime import datetime

@dataclass
class DocumentSummary:
    """文档摘要结果"""
    title: str
    summary: str
    key_points: List[str]
    reading_time: int  # 预计阅读时间（分钟）
    word_count: int
    created_at: str

class DocumentParser:
    """文档解析器"""
    @staticmethod
    async def parse_pdf(file_path: str) -> str:
        """解析 PDF 文件"""
        text = ""
        with open(file_path, 'rb') as file:
            pdf_reader = PyPDF2.PdfReader(file)
            for page in pdf_reader.pages:
                text += page.extract_text() + "\n"
        return text

    @staticmethod
    async def parse_text(file_path: str) -> str:
        """解析纯文本文件"""
        with open(file_path, 'r', encoding='utf-8') as f:
            return f.read()

    @staticmethod
    async def parse_url(url: str) -> str:
        """解析网页内容"""
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                html = await response.text()
                soup = BeautifulSoup(html, 'html.parser')
                for script in soup(['script', 'style']):
                    script.decompose()
                return soup.get_text(separator='\n', strip=True)

class TextChunker:
    """文本分块器"""
    def __init__(self, chunk_size: int = 3000, overlap: int = 200):
        self.chunk_size = chunk_size
        self.overlap = overlap

    def chunk(self, text: str) -> List[str]:
        """将文本分成多个块"""
        paragraphs = text.split('\n\n')
        chunks = []
        current_chunk = ""
        for para in paragraphs:
            if len(current_chunk) + len(para) <= self.chunk_size:
                current_chunk += para + "\n\n"
            else:
                if current_chunk:
                    chunks.append(current_chunk.strip())
                if len(para) > self.chunk_size:
                    for i in range(0, len(para), self.chunk_size - self.overlap):
                        chunks.append(para[i:i + self.chunk_size])
                current_chunk = ""
        if current_chunk:
            chunks.append(current_chunk.strip())
        return chunks

class DocumentSummarizer:
    """智能文档总结器"""
    def __init__(self, llm_client: LLMClient):
        self.llm = llm_client
        self.parser = DocumentParser()
        self.chunker = TextChunker()

    async def summarize(self, source: str, source_type: str = "file", output_format: str = "markdown") -> DocumentSummary:
        """总结文档"""
        print(f"📖 正在解析文档：{source}")
        if source_type == "url":
            text = await self.parser.parse_url(source)
            title = await self._extract_title_from_url(text)
        else:
            if source.endswith('.pdf'):
                text = await self.parser.parse_pdf(source)
            else:
                text = await self.parser.parse_text(source)
            title = Path(source).stem
        
        word_count = len(text)
        reading_time = max(1, word_count // 500)
        print(f"✅ 解析完成，共 {word_count} 字，预计阅读 {reading_time} 分钟")
        print(f"🔪 正在分块...")
        chunks = self.chunker.chunk(text)
        print(f"📦 分成 {len(chunks)} 个块")
        print(f"🤖 正在 AI 总结...")
        chunk_summaries = await self._summarize_chunks(chunks)
        print(f"🔄 正在整合摘要...")
        final_summary = await self._merge_summaries(chunk_summaries, title)
        key_points = await self._extract_key_points(final_summary)

        return DocumentSummary(
            title=title,
            summary=final_summary,
            key_points=key_points,
            reading_time=reading_time,
            word_count=word_count,
            created_at=datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        )

    async def _summarize_chunks(self, chunks: List[str]) -> List[str]:
        semaphore = asyncio.Semaphore(5)
        async def summarize_chunk(chunk: str, index: int):
            async with semaphore:
                prompt = f"""请总结以下文本的核心内容，要求：
1. 保留关键信息（数据、结论、人名等）
2. 省略细节和例子
3. 用简洁的语言表达
4. 200 字以内
文本内容：{chunk}
总结："""
                response = await self.llm.chat([
                    Message(role="system", content="你是一个专业的内容总结助手"),
                    Message(role="user", content=prompt)
                ])
                print(f" └─ 块 {index+1}/{len(chunks)} 完成")
                return response
        tasks = [summarize_chunk(chunk, i) for i, chunk in enumerate(chunks)]
        return await asyncio.gather(*tasks)

    async def _merge_summaries(self, summaries: List[str], title: str) -> str:
        combined = "\n\n".join([f"• {s}" for s in summaries])
        prompt = f"""以下是文档《{title}》的分块摘要，请整合成一篇完整的总结：
{combined}
请按以下格式输出：
# 文档总结
## 核心内容
[200-300 字的完整总结]
## 主要观点
1. [观点 1]
2. [观点 2] ...
整合后的总结："""
        response = await self.llm.chat([
            Message(role="system", content="你是一个专业的内容整合助手"),
            Message(role="user", content=prompt)
        ])
        return response

    async def _extract_key_points(self, summary: str) -> List[str]:
        prompt = f"""从以下总结中提取 5-7 个关键要点，每点不超过 20 字：
{summary}
只输出要点列表，每行一个："""
        response = await self.llm.chat([Message(role="user", content=prompt)])
        return [line.strip() for line in response.split('\n') if line.strip()]

    async def _extract_title_from_url(self, text: str) -> str:
        prompt = f"""从以下文本中提取文章标题，只返回标题：
{text[:500]}
标题："""
        response = await self.llm.chat([Message(role="user", content=prompt)])
        return response.strip()

# 使用示例
async def main_summarizer():
    llm = LLMClient()
    summarizer = DocumentSummarizer(llm)
    result = await summarizer.summarize(source="research_paper.pdf", source_type="file")
    print("\n" + "="*60)
    print(f"📄 标题：{result.title}")
    print(f"⏱️ 预计阅读时间：{result.reading_time} 分钟")
    print(f"📊 字数：{result.word_count}")
    print("\n🔑 关键要点:")
    for point in result.key_points:
        print(f" • {point}")
    print(f"\n📝 总结:\n{result.summary}")

if __name__ == "__main__":
    asyncio.run(main_summarizer())

文档类型	原始阅读时间	AI 总结时间	效率提升
论文（30 页）	60 分钟	30 秒	120 倍
技术文档	20 分钟	15 秒	80 倍
新闻文章	5 分钟	10 秒	30 倍
行业报告	45 分钟	25 秒	108 倍

import re
import subprocess
import tempfile
from typing import Dict, List, Optional, Tuple
from enum import Enum
import ast

class CodeMode(Enum):
    """代码生成模式"""
    GENERATE = "generate"
    EXPLAIN = "explain"
    OPTIMIZE = "optimize"
    DEBUG = "debug"
    TEST = "test"

@dataclass
class CodeResult:
    """代码生成结果"""
    code: str
    language: str
    explanation: str
    tests: Optional[str] = None
    warnings: List[str] = None

class CodeGenerator:
    """AI 代码生成器"""
    def __init__(self, llm_client: LLMClient):
        self.llm = llm_client
        self.quality_rules = {
            "security": [r"eval\s*\(", r"exec\s*\(", r"pickle\.loads?"],
            "performance": [r"for\s+\w+\s+in\s+range\(len\("]
        }

    async def generate(self, requirement: str, language: str = "python", mode: CodeMode = CodeMode.GENERATE, context: str = "") -> CodeResult:
        mode_prompts = {
            CodeMode.GENERATE: self._build_generate_prompt,
            CodeMode.EXPLAIN: self._build_explain_prompt,
            CodeMode.OPTIMIZE: self._build_optimize_prompt,
            CodeMode.DEBUG: self._build_debug_prompt,
            CodeMode.TEST: self._build_test_prompt,
        }
        prompt_builder = mode_prompts[mode]
        prompt = prompt_builder(requirement, language, context)
        print(f"🤖 正在生成{mode.value}...")
        response = await self.llm.chat([
            Message(role="system", content=self._get_system_prompt(language)),
            Message(role="user", content=prompt)
        ])
        code, explanation = self._parse_code_response(response, language)
        warnings = self._security_check(code)
        tests = None
        if mode == CodeMode.GENERATE:
            tests = await self._generate_tests(code, language)
        return CodeResult(code=code, language=language, explanation=explanation, tests=tests, warnings=warnings)

    def _get_system_prompt(self, language: str) -> str:
        return f"""你是一个专业的{language}程序员和教师。
输出代码时：
1. 代码必须可直接运行
2. 添加必要的注释和文档字符串
3. 遵循{language}最佳实践和 PEP8 规范
4. 包含错误处理
5. 代码后附上简洁的使用说明
输出格式：
```python
# 代码块

def _build_generate_prompt(self, requirement: str, language: str, context: str) -> str:
    if context:
        return f"""请根据以下需求生成{language}代码：

async def chat(self, user_input: str) -> str:
    intent = await self._detect_intent(user_input)
    if intent == "generate":
        result = await self.generator.generate(requirement=user_input, mode=CodeMode.GENERATE)
        output = f"```python\n{result.code}\n```\n\n"
        output += f"**说明：**\n{result.explanation}\n\n"
        if result.warnings:
            output += "**安全警告：**\n" + "\n".join(result.warnings) + "\n\n"
        if result.tests:
            output += f"**测试代码：**\n```python\n{result.tests}\n```"
        return output
    elif intent == "explain":
        code = self._extract_code_from_input(user_input)
        result = await self.generator.generate(requirement=code, mode=CodeMode.EXPLAIN)
        return result.explanation

async def _detect_intent(self, user_input: str) -> str:
    prompt = f"""判断用户意图，只返回：generate / explain / optimize / debug

def _extract_code_from_input(self, user_input: str) -> str:
    match = re.search(r'```(?:python)?\n(.*?)```', user_input, re.DOTALL)
    if match:
        return match.group(1).strip()
    return user_input


#### 3.3 代码生成能力对比

| 功能 | ChatGPT 网页版 | 本地 AI 工具 | 优势 |
|-----|--------------|-----------|------|
| **生成速度** | 3-5 秒 | 2-3 秒 | 快 40% |
| **代码可运行率** | 85% | 90%+ | 自定义优化 |
| **安全检查** | ❌ | ✅ | 内置规则 |
| **测试生成** | 需额外要求 | 自动生成 | 一站式 |
| **批量处理** | ❌ | ✅ | 脚本化 |
| **成本** | $20/月 | ¥10/月 | 省 60% |

* * *

### 四、工具三：智能资料助手

#### 4.1 系统架构

```mermaid
graph TB
A[用户提问] --> B[问题分析]
B --> C{问题类型？}
C -->|事实查询 | D[搜索引擎]
C -->|API 文档 | E[官方文档库]
C -->|StackOverflow| F[SO 搜索]
C -->|综合查询 | G[多源并行搜索]
D --> H[结果提取]
E --> H
F --> H
G --> H
H --> I[内容清洗]
I --> J[相关性排序]
J --> K[AI 总结整合]
K --> L[结构化输出]
L --> M[直接答案]
L --> N[参考链接]
L --> O[相关推荐]

import aiohttp
from typing import List, Dict, Optional
from dataclasses import dataclass
import re
from urllib.parse import quote, urljoin
import json

@dataclass
class SearchResult:
    """搜索结果"""
    title: str
    url: str
    snippet: str
    source: str  # google / bing / docs / stackoverflow
    relevance: float = 0.0

@dataclass
class ResearchResult:
    """研究结果"""
    answer: str
    sources: List[SearchResult]
    related_questions: List[str]
    confidence: float

class SearchEngine:
    """搜索引擎封装"""
    def __init__(self, bing_api_key: str = None):
        self.bing_api_key = bing_api_key or os.getenv("BING_SEARCH_API_KEY")
        self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}

    async def search_bing(self, query: str, count: int = 10) -> List[SearchResult]:
        if not self.bing_api_key:
            return await self._search_duckduckgo(query, count)
        url = "https://api.bing.microsoft.com/v7.0/search"
        params = {"q": query, "count": count, "responseFilter": "webpages"}
        async with aiohttp.ClientSession() as session:
            async with session.get(url, params=params, headers={"Ocp-Apim-Subscription-Key": self.bing_api_key}) as response:
                data = await response.json()
                results = []
                for item in data.get("webPages", {}).get("value", []):
                    results.append(SearchResult(title=item["name"], url=item["url"], snippet=item["snippet"], source="bing"))
                return results

    async def _search_duckduckgo(self, query: str, count: int = 10) -> List[SearchResult]:
        url = f"https://html.duckduckgo.com/html/?q={quote(query)}"
        async with aiohttp.ClientSession() as session:
            async with session.get(url, headers=self.headers) as response:
                html = await response.text()
                from bs4 import BeautifulSoup
                soup = BeautifulSoup(html, 'html.parser')
                results = []
                for result in soup.select('.result')[:count]:
                    title_elem = result.select_one('.result__a')
                    snippet_elem = result.select_one('.result__snippet')
                    url_elem = result.select_one('.result__url')
                    if title_elem and url_elem:
                        results.append(SearchResult(
                            title=title_elem.get_text(),
                            url=url_elem.get('href', ''),
                            snippet=snippet_elem.get_text() if snippet_elem else '',
                            source="duckduckgo"
                        ))
                return results

    async def search_stackoverflow(self, query: str, count: int = 5) -> List[SearchResult]:
        search_query = f"site:stackoverflow.com {query}"
        results = await self._search_duckduckgo(search_query, count)
        for r in results:
            r.source = "stackoverflow"
        return results

    async def search_docs(self, query: str, docs_domain: str, count: int = 5) -> List[SearchResult]:
        search_query = f"site:{docs_domain}{query}"
        results = await self._search_duckduckgo(search_query, count)
        for r in results:
            r.source = "docs"
        return results

class IntelligentResearcher:
    """智能研究助手"""
    def __init__(self, llm_client: LLMClient, search_engine: SearchEngine):
        self.llm = llm_client
        self.search = search_engine

    async def research(self, question: str, depth: int = 1, sources: List[str] = None) -> ResearchResult:
        print(f"🔍 正在研究：{question}")
        search_tasks = []
        if not sources or "google" in sources:
            search_tasks.append(self.search.search_bing(question))
        if not sources or "stackoverflow" in sources:
            search_tasks.append(self.search.search_stackoverflow(question))
        if self._is_technical_question(question):
            tech = await self._detect_tech_stack(question)
            if tech:
                docs_url = self._get_docs_url(tech)
                search_tasks.append(self.search.search_docs(question, docs_url))
        search_results_list = await asyncio.gather(*search_tasks)
        all_results = []
        for results in search_results_list:
            all_results.extend(results)
        print(f"📊 找到 {len(all_results)} 条相关结果")
        if depth > 1:
            all_results = await self._fetch_page_contents(all_results[:5])
        answer = await self._synthesize_answer(question, all_results)
        related = await self._generate_related_questions(question, answer)
        confidence = self._calculate_confidence(all_results)
        return ResearchResult(answer=answer, sources=all_results[:5], related_questions=related, confidence=confidence)

    def _is_technical_question(self, question: str) -> bool:
        tech_keywords = ["python", "javascript", "java", "api", "函数", "如何使用", "怎么用", "documentation", "example"]
        return any(kw in question.lower() for kw in tech_keywords)

    async def _detect_tech_stack(self, question: str) -> Optional[str]:
        prompt = f"""从以下问题中检测涉及的技术栈，只返回技术名称：
问题：{question}
技术栈（如 python、react、docker 等）："""
        response = await self.llm.chat([Message(role="user", content=prompt)])
        tech = response.strip().lower()
        tech_docs = {
            "python": "docs.python.org",
            "javascript": "developer.mozilla.org",
            "react": "react.dev",
            "vue": "vuejs.org",
            "docker": "docs.docker.com",
            "kubernetes": "kubernetes.io",
        }
        return tech_docs.get(tech)

    def _get_docs_url(self, tech: str) -> str:
        tech_docs = {
            "python": "docs.python.org",
            "javascript": "developer.mozilla.org",
            "react": "react.dev",
            "vue": "vuejs.org",
            "docker": "docs.docker.com",
        }
        return tech_docs.get(tech, "docs.python.org")

    async def _fetch_page_contents(self, results: List[SearchResult]) -> List[SearchResult]:
        async def fetch_content(result: SearchResult):
            try:
                async with aiohttp.ClientSession() as session:
                    async with session.get(result.url, headers=self.search.headers, timeout=aiohttp.ClientTimeout(total=10)) as response:
                        html = await response.text()
                        from bs4 import BeautifulSoup
                        soup = BeautifulSoup(html, 'html.parser')
                        for script in soup(['script', 'style', 'nav', 'footer']):
                            script.decompose()
                        text = soup.get_text(separator='\n', strip=True)
                        result.snippet = text[:2000] + "..."
                        result.relevance = 1.0
            except Exception as e:
                print(f" ⚠️ 获取失败 {result.url}: {e}")
        tasks = [fetch_content(r) for r in results]
        await asyncio.gather(*tasks)
        return results

    async def _synthesize_answer(self, question: str, results: List[SearchResult]) -> str:
        context = "\n\n".join([f"来源{i+1}: {r.title}\n{r.snippet}\n链接：{r.url}" for i, r in enumerate(results[:5])])
        prompt = f"""基于以下搜索结果回答问题，要求：
1. 准确引用信息来源
2. 综合多个来源的信息
3. 如果信息冲突，说明不同观点
4. 给出清晰的结构化答案
5. 标注信息来源（如 [来源 1]）
问题：{question}
搜索结果：
{context}
请给出详细答案："""
        answer = await self.llm.chat([
            Message(role="system", content="你是一个专业的研究助手，擅长综合多源信息给出准确答案"),
            Message(role="user", content=prompt)
        ])
        return answer

    async def _generate_related_questions(self, question: str, answer: str) -> List[str]:
        prompt = f"""基于以下问答，生成 3-5 个相关的深入研究问题：
问题：{question}
答案：{answer[:500]}...
请生成相关问题，每行一个："""
        response = await self.llm.chat([Message(role="user", content=prompt)])
        return [line.strip() for line in response.split('\n') if line.strip() and not line.startswith('-')][:5]

    def _calculate_confidence(self, results: List[SearchResult]) -> float:
        if not results:
            return 0.0
        base_confidence = min(1.0, len(results)/10)
        has_docs = any(r.source == "docs" for r in results)
        if has_docs:
            base_confidence = min(1.0, base_confidence + 0.2)
        return round(base_confidence, 2)

# 使用示例
async def main_researcher():
    llm = LLMClient()
    search = SearchEngine()
    researcher = IntelligentResearcher(llm, search)
    result = await researcher.research(question="Python 中 asyncio 和 multiprocessing 的区别是什么？", depth=2)
    print("\n" + "="*60)
    print("📚 研究结果")
    print("="*60)
    print(f"\n置信度：{result.confidence*100}%\n")
    print(f"答案:\n{result.answer}\n")
    print("📖 参考来源:")
    for i, source in enumerate(result.sources, 1):
        print(f"{i}. {source.title}")
        print(f" {source.url}")
        print(f" 来源：{source.source}\n")
    print("❓ 相关问题:")
    for q in result.related_questions:
        print(f" • {q}")

if __name__ == "__main__":
    asyncio.run(main_researcher())

操作	手动搜索	AI 助手	效率提升
单源查询	3 分钟	10 秒	18 倍
多源对比	15 分钟	30 秒	30 倍
技术文档查询	8 分钟	15 秒	32 倍
深度研究	1 小时 +	2 分钟	30 倍 +

import argparse
import asyncio
from pathlib import Path
import json

class AIToolsCLI:
    """AI 工具命令行界面"""
    def __init__(self):
        self.llm = LLMClient()
        self.summarizer = DocumentSummarizer(self.llm)
        self.code_assistant = InteractiveCodeAssistant(self.llm)
        self.researcher = IntelligentResearcher(self.llm, SearchEngine())

    async def run(self):
        parser = argparse.ArgumentParser(description="AI 工具集 - 你的智能助手", formatter_class=argparse.RawDescriptionHelpFormatter,
                                         epilog="""示例:
# 总结文档
python ai_tools.py summarize paper.pdf
# 生成代码
python ai_tools.py code "用 Python 写一个爬虫"
# 研究问题
python ai_tools.py research "量子计算的原理"
""")
        subparsers = parser.add_subparsers(dest='command', help='可用命令')
        sum_parser = subparsers.add_parser('summarize', help='总结文档')
        sum_parser.add_argument('file', help='文件路径或 URL')
        sum_parser.add_argument('-t', '--type', default='file', choices=['file', 'url'], help='输入类型')
        sum_parser.add_argument('-o', '--output', help='输出文件路径')
        code_parser = subparsers.add_parser('code', help='生成/处理代码')
        code_parser.add_argument('prompt', help='需求或代码')
        code_parser.add_argument('-m', '--mode', choices=['generate', 'explain', 'optimize', 'debug'], default='generate', help='处理模式')
        code_parser.add_argument('-l', '--language', default='python', help='编程语言')
        code_parser.add_argument('-x', '--execute', action='store_true', help='执行生成的代码')
        res_parser = subparsers.add_parser('research', help='研究问题')
        res_parser.add_argument('question', help='研究问题')
        res_parser.add_argument('-d', '--depth', type=int, default=1, choices=[1, 2, 3], help='研究深度')
        res_parser.add_argument('-s', '--sources', nargs='+', choices=['google', 'docs', 'stackoverflow'], help='指定搜索源')
        args = parser.parse_args()
        if not args.command:
            parser.print_help()
            return
        if args.command == 'summarize':
            await self._cmd_summarize(args)
        elif args.command == 'code':
            await self._cmd_code(args)
        elif args.command == 'research':
            await self._cmd_research(args)

    async def _cmd_summarize(self, args):
        print(f"📖 正在总结：{args.file}")
        result = await self.summarizer.summarize(source=args.file, source_type=args.type)
        output = f"""# {result.title}
**📊 统计信息**
- 字数：{result.word_count}
- 预计阅读时间：{result.reading_time} 分钟
- 生成时间：{result.created_at}
**🔑 关键要点**
{chr(10).join(f'{i+1}. {p}' for i, p in enumerate(result.key_points))}
**📝 总结**
{result.summary}"""
        if args.output:
            with open(args.output, 'w', encoding='utf-8') as f:
                f.write(output)
            print(f"✅ 已保存到：{args.output}")
        else:
            print(output)

    async def _cmd_code(self, args):
        print(f"💻 正在处理：{args.prompt[:50]}...")
        result = await self.code_assistant.generator.generate(requirement=args.prompt, language=args.language, mode=CodeMode(args.mode))
        print(f"\n```{args.language}")
        print(result.code)
        print("```\n")
        print(f"**说明**\n{result.explanation}\n")
        if result.warnings:
            print("**警告**")
            for w in result.warnings:
                print(f" {w}")
            print()
        if result.tests:
            print(f"**测试代码**\n```{args.language}")
            print(result.tests)
            print("```\n")
        if args.execute:
            print("⚡ 正在执行代码...")
            exec_result = await self.code_assistant.generator.execute_code(result.code, args.language)
            if exec_result['success']:
                print(f"✅ 执行成功\n输出:\n{exec_result['output']}")
            else:
                print(f"❌ 执行失败\n错误:\n{exec_result['error']}")

    async def _cmd_research(self, args):
        print(f"🔍 正在研究：{args.question}")
        result = await self.researcher.research(question=args.question, depth=args.depth, sources=args.sources)
        print(f""" # 研究结果
**📊 置信度**: {result.confidence*100}%
## 答案
{result.answer}
## 参考来源 """)
        for i, source in enumerate(result.sources, 1):
            print(f"{i}. **{source.title}**")
            print(f" 链接：{source.url}")
            print(f" 来源：{source.source}\n")
        if result.related_questions:
            print("## 相关问题")
            for q in result.related_questions:
                print(f"- {q}")

async def main():
    cli = AIToolsCLI()
    await cli.run()

if __name__ == "__main__":
    asyncio.run(main())

# 总结论文
python ai_tools.py summarize research_paper.pdf -o summary.md
# 生成代码并执行
python ai_tools.py code "用 Python 写一个二分查找" -x
# 解释代码
python ai_tools.py code "explain this code: `def foo(): return 1`" -m explain
# 深度研究
python ai_tools.py research "RAG 和 Fine-tuning 的区别" -d2

使用场景	月调用量	月成本	对比 ChatGPT Plus
轻度使用	10 万 tokens	¥5	省 75%
中度使用	100 万 tokens	¥50	省 60%
重度使用	1000 万 tokens	¥500	省 40%

ai-tools/
├── src/
│   ├── __init__.py
│   ├── llm.py # LLM 客户端
│   ├── summarizer.py # 文档总结器
│   ├── code_generator.py # 代码生成器
│   └── researcher.py # 研究助手
├── cli.py # 命令行入口
├── config.py # 配置管理
├── requirements.txt # 依赖列表
├── .env.example # 环境变量示例
├── README.md # 使用文档
└── examples/
    ├── example_summarize.py
    ├── example_code.py
    └── example_research.py

# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY src/./src/
COPY cli.py .
COPY config.py .
ENV PYTHONPATH=/app
CMD ["python", "cli.py", "--help"]

# docker-compose.yml
version: '3.8'
services:
  ai-tools:
    build: .
    env_file:
      - .env
    volumes:
      - ./data:/app/data
    ports:
      - "8000:8000"

功能方向	实现方式	难度
Web 界面	FastAPI + Vue3	⭐⭐⭐
多模态支持	GPT-4V 处理图片	⭐⭐
语音交互	Whisper + TTS	⭐⭐⭐
本地模型	Ollama + Llama3	⭐⭐⭐⭐
Agent 能力	添加工具调用	⭐⭐⭐⭐

工具	核心价值	适用场景
智能文档总结器	10 秒读完 100 页	论文研读、报告分析
AI 代码生成器	说人话写代码	快速原型、学习参考
智能资料助手	秒速精准检索	技术调研、问题解决

Python 构建 AI 三工具：文档总结、代码生成与智能检索

一、准备工作：环境与 API 配置

1.1 技术栈选择

1.2 环境配置

1.3 核心工具类封装

更多推荐文章

相关免费在线工具

二、工具一：智能文档总结器

2.1 功能设计

2.2 核心代码实现

2.3 使用效果对比

三、工具二：AI 代码生成器

3.1 功能架构

3.2 核心实现

4.2 核心代码

4.3 搜索效率对比

五、整合三大利器：打造超级 AI 助手

5.1 统一 CLI 工具

5.2 使用示例

5.3 成本分析

六、完整源码与部署指南

6.1 项目结构

6.2 部署到云端

6.3 进阶功能扩展

总结

关键收获

下一步学习

参考资源

更多推荐文章

相关免费在线工具

Python 构建 AI 三工具：文档总结、代码生成与智能检索

一、准备工作：环境与 API 配置

1.1 技术栈选择

1.2 环境配置

1.3 核心工具类封装

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

二、工具一：智能文档总结器

2.1 功能设计

2.2 核心代码实现

2.3 使用效果对比

三、工具二：AI 代码生成器

3.1 功能架构

3.2 核心实现

4.2 核心代码

4.3 搜索效率对比

五、整合三大利器：打造超级 AI 助手

5.1 统一 CLI 工具

5.2 使用示例

5.3 成本分析

六、完整源码与部署指南

6.1 项目结构

6.2 部署到云端

6.3 进阶功能扩展

总结

关键收获

下一步学习

参考资源

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具