AI 大模型源码解析：打造 GitHub 智能答疑助手 | 极客日志

PythonAI算法

AI 大模型源码解析：打造 GitHub 智能答疑助手

综述由AI生成PeterCat 这一基于大模型的 GitHub 智能答疑助手的源码实现。该项目利用 LangChain 框架，集成了向量化知识库构建、RAG 检索生成、Prompt 工程及工具链调用等技术。核心流程包括将 GitHub 仓库文档经 Load-Split-Embed-Store 处理后存入 Supabase 向量数据库，用户查询时通过 Embedding 匹配获取上下文，结合 OpenAI 或 Gemini 等大模型生成回复。文章详细拆解了 Agent 工作流、LLM 客户端封装、Prompt 模板设计以及工具预置逻辑，展示了如何在垂直场景下高效落地大模型应用，并探讨了后续在多智能体协同及模型微调方向的优化空间。

雪落无声发布于 2025/2/6更新于 2026/6/318 浏览

AI 大模型源码解析：打造 GitHub 智能答疑助手

前段时间发现一个大模型在垂类场景的典型应用，定位是 Github 智能答疑助手。功能包含优质仓库推荐、代码片段解读、提 issue、查 issue 等。对于 Ant Design 这种前端组件库，甚至可以通过一张原型图，给出界面上包含哪些 Ant Design 的原子组件。

亲自体验了一番，感觉确实很方便。其源码已在 Github 上开源，整体实现涉及 Prompt 调优、向量化知识库构建以及 Langchain 工具链集成，每个环节都有很多学习之处。本文将通过源码分析大模型部分的实现细节。

PeterCat 简介

如官网介绍，PeterCat 是专为社区维护者和开发者打造的智能答疑机器人解决方案。

支持用户在平台中通过对话模式快速搭建一个 Github 仓库的智能答疑机器人，内置 提 issue、查 issue、回 issue、Discussion 回复、PR Summary、Code Review、项目信息查询 基础能力，也可以通过自托管部署方案和一体化应用 SDK 集成至项目仓库。

目前在 PeterCat 平台中，已有 9 个前端方向典型应用的智能答疑机器人。

以 Ant Design 为例，我们不仅可以通过 Ant Design 答疑小助手来了解如何快速上手 Ant Design 组件库的使用，还能通过一张原型图快速判断出通过 Ant Design 的哪些组件可以实现，甚至连图表和文字都能准确识别，大模型对于图片中内容的解析真的有超乎预期！

原型稿	识别结果
界面截图	识别出 Button, Input, Form 等组件

接下来，就让我们一起来看看 PeterCat 是如何做到的吧！

源码解读

业界通用方案

所谓大模型垂类应用场景，指的是大模型在特定领域的应用，主要解决的是诸如 GPT、通义千问等通用大模型在特定领域由于缺乏领域知识而表现欠佳的问题。

这类场景通常需要喂给大模型大量的数据作为知识库进行辅助决策。

关于大模型在垂类场景的应用，常见的执行 SOP 包含：

向量化知识库构建
(大模型微调)
用户 prompt 输入
向量化关键词检索
查询结果精排
大模型 prompt 生成
大模型意图识别
结果生成

如果是自建大模型，还可以通过大模型微调来让意图识别质量更高。

可以发现，无论在知识库构建阶段，还是关键词检索阶段，向量化 都是被反复提及的概念。

所谓向量化，其实指的是将大规模的数据或文本转化为向量的表示方式。经过向量化处理的数据，能够更好地表达数据之间的关系和相似性，提高模型的训练和预测效果。

而从向量化关键词检索到大模型 prompt 生成的过程，就是我们常说的 RAG (Retrieval-Augmented Generation)，目的是为了最终喂给大模型的 Prompt 质量更高，也就是大家常说的如何更好的向大模型提问。

在上述的 SOP 执行基础上，借助 Langchain 将 Embedding、Prompt 生成、工具链式调用进行集成，就大致可以完成大模型在垂类领域的完整调用了。

如果把大模型比作是人类的大脑，那么 Langchain 可以类比为人类的四肢和躯干，大模型只用专注于模型预测的核心工作，至于工具调用、上下文记忆、多轮对话等工作交给 Langchain 进行统筹管理即可。

PeterCat 源码结构

有了上述一些基础知识，我们来看下 PeterCat 在大模型相关模块的实现细节：

PeterCat 大模型相关的代码主要集中在目录下，整体包含：

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

server/agent

def supabase_embedding(documents, **kwargs: Any):
    from langchain_text_splitters import CharacterTextSplitter

    try:
        text_splitter = CharacterTextSplitter(
            chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP
        )
        docs = text_splitter.split_documents(documents)
        embeddings = OpenAIEmbeddings()
        vector_store = SupabaseVectorStore.from_documents(
            docs,
            embeddings,
            client=get_client(),
            table_name=TABLE_NAME,
            query_name=QUERY_NAME,
            chunk_size=CHUNK_SIZE,
            **kwargs,
        )
        return vector_store
    except Exception as e:
        print(e)
        return None

def add_knowledge_by_doc(config: RAGGitDocConfig):
    loader = init_github_file_loader(config)
    documents = loader.load()
    supabase = get_client()
    is_doc_added_query = (
        supabase.table(TABLE_NAME)
        .select("id, repo_name, commit_id, file_path, bot_id")
        .eq("repo_name", config.repo_name)
        .eq("commit_id", loader.commit_id)
        .eq("file_path", config.file_path)
        .eq("bot_id", config.bot_id)
        .execute()
    )
    if not is_doc_added_query.data:
        is_doc_equal_query = (
            supabase.table(TABLE_NAME).select("*").eq("file_sha", loader.file_sha)
        ).execute()
        if not is_doc_equal_query.data:
            # If there is no file with the same file_sha, perform embedding.
            store = supabase_embedding(
                documents,
                repo_name=config.repo_name,
                commit_id=loader.commit_id,
                file_sha=loader.file_sha,
                file_path=config.file_path,
                bot_id=config.bot_id,
            )
            return store
        else:
            new_commit_list = [
                {
                    **{k: v for k, v in item.items() if k != "id"},
                    "repo_name": config.repo_name,
                    "commit_id": loader.commit_id,
                    "file_path": config.file_path,
                    "bot_id": config.bot_id,
                }
                for item in is_doc_equal_query.data
            ]
            insert_result = supabase.table(TABLE_NAME).insert(new_commit_list).execute()
            return insert_result
    else:
        return True

def search_knowledge(
    query: str,
    bot_id: str,
    meta_filter: Dict[str, Any] = {},
):
    retriever = init_retriever({"filter": {"metadata": meta_filter, "bot_id": bot_id}})
    docs = retriever.invoke(query)
    documents_as_dicts = [convert_document_to_dict(doc) for doc in docs]
    json_output = json.dumps(documents_as_dicts, ensure_ascii=False)
    return json_output

@register_llm_client("openai")
class OpenAIClient(BaseLLMClient):
    _client: ChatOpenAI

    def __init__(
        self,
        temperature: Optional[int] = 0.2,
        max_tokens: Optional[int] = 1500,
        streaming: Optional[bool] = False,
        api_key: Optional[str] = OPEN_API_KEY,
    ):
        self._client = ChatOpenAI(
            model_name="gpt-4o",
            temperature=temperature,
            streaming=streaming,
            max_tokens=max_tokens,
            openai_api_key=api_key,
            stream_usage=True,
        )

    def get_client(self):
        return self._client

    def get_tools(self, tools: List[Any]):
        return [convert_to_openai_tool(tool) for tool in tools]

    def parse_content(self, content: List[MessageContent]):
        return content

from typing import Optional

CREATE_PROMPT = """
## Role:
You are a GitHub Answering Bot Creation Assistant. You specialize in creating a Q&A bot based on the information of a GitHub repository provided by the user.

## Skills:

Skill 1: Retrieve GitHub Repository Name

- Guide users to provide their GitHub repository name or URL.
- Extract the GitHub repository name from the provided GitHub URL

Skill 2: Create a Q&A Bot

- Use the create_bot tool to create a bot based on the GitHub repository name provided by the user.
- The uid of the current user is {user_id}

Skill 3: Modify Bot Configuration

- Utilize the edit_bot tool to modify the bot's configuration information based on the user's description.
- Always use the created bot's ID as the id of the bot being edited and the user's ID as the uid.
- If the user wishes to change the avatar, ask user to provide the URL of the new avatar.

## Limitations:

- Can only create a Q&A bot or update the configuration of the bot based on the GitHub repository information provided by the user.
- During the process of creating a Q&A bot, if any issues or errors are encountered, you may provide related advice or solutions, but must not directly modify the user's GitHub repository.
- When modifying the bot's configuration information, you must adhere to the user's suggestions and requirements and not make changes without permission.
- Whenever you encounter a 401 or Unauthorized error that seems to be an authentication failure, please inform the user in the language they are using to converse with you. For example:

If user is conversing with you in Chinese:
'您必须先使用 GitHub 登录 Petercat 才能使用此功能。[登录地址](https://api.petercat.ai/api/auth/login)

If user is conversing with you in English:
'You must log in to Petercat using GitHub before accessing this feature.' [Login URL](https://api.petercat.ai/api/auth/login)
"""

EDIT_PROMPT = """
## Role:
You are a GitHub Answering Bot modifying assistant. You specialize in modifying the configuration of a Q&A bot based on the user's requirements.

## Skills:

- Utilize the edit_bot tool to modify the bot's configuration information based on the user's description.
- Always use the created bot's ID: {bot_id} as the id of the bot being edited and the uid of the current user is {user_id}.
- If the user wishes to change the avatar, ask user to provide the URL of the new avatar.

## Limitations:

- Can only update the configuration of the bot based on the GitHub repository information provided by the user.
- During the process of  a Q&A bot, if any issues or errors are encountered, you may provide related advice or solutions, but must not directly modify the user's GitHub repository.
- When modifying the bot's configuration information, you must adhere to the user's suggestions and requirements and not make changes without permission.

If user is conversing with you in Chinese:
'您必须先使用 GitHub 登录 Petercat 才能使用此功能。[登录地址](https://api.petercat.ai/api/auth/login)

If user is conversing with you in English:
'You must log in to Petercat using GitHub before accessing this feature.' [Login URL](https://api.petercat.ai/api/auth/login)
"""

def generate_prompt_by_user_id(user_id: str, bot_id: Optional[str]):
    if bot_id:
        return EDIT_PROMPT.format(bot_id=bot_id, user_id=user_id)
    else:
        return CREATE_PROMPT.format(user_id=user_id)

def _create_agent_with_tools(self) -> AgentExecutor:
        llm = self.chat_model.get_client()

        tools = self.init_tavily_tools() if self.enable_tavily else []

        for tool in self.tools.values():
            tools.append(tool)

        if tools:
            parsed_tools = self.chat_model.get_tools(tools)
            llm = llm.bind_tools(parsed_tools)

        self.prompt = self.get_prompt()
        agent = (
            {
                "input": lambda x: x["input"],
                "agent_scratchpad": lambda x: format_to_openai_tool_messages(
                    x["intermediate_steps"]
                ),
                "chat_history": lambda x: x["chat_history"],
            }
            | self.prompt
            | llm
            | OpenAIToolsAgentOutputParser()
        )

        return AgentExecutor(
            agent=agent,
            tools=tools,
            verbose=True,
            handle_parsing_errors=True,
            max_iterations=5,
        )

def agent_stream_chat(
    input_data: ChatData, 
    user_id: str,
    bot_id: str,
) -> AsyncIterator[str]:
    prompt = generate_prompt_by_user_id(user_id, bot_id)
    agent = AgentBuilder(
        chat_model=OpenAIClient(),
        prompt=prompt, tools=TOOL_MAPPING, enable_tavily=False
    )
    return dict_to_sse(
        agent.run_stream_chat(input_data)
    )

import json
import logging
from typing import AsyncGenerator, AsyncIterator, Dict, Callable, Optional
from langchain.agents import AgentExecutor
from agent.llm import BaseLLMClient
from petercat_utils.data_class import ChatData, Message
from langchain.agents.format_scratchpad.openai_tools import (
    format_to_openai_tool_messages,
)
from langchain_core.messages import (
    AIMessage,
    FunctionMessage,
    HumanMessage,
    SystemMessage,
)
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser
from langchain.prompts import MessagesPlaceholder
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.utilities.tavily_search import TavilySearchAPIWrapper
from langchain_community.tools.tavily_search.tool import TavilySearchResults
from petercat_utils import get_env_variable


TAVILY_API_KEY = get_env_variable("TAVILY_API_KEY")

logger = logging.getLogger()

async def dict_to_sse(generator: AsyncGenerator[Dict, None]):
    ...

class AgentBuilder:
    agent_executor: AgentExecutor

    def __init__(
        self,
        chat_model: BaseLLMClient,
        prompt: str,
        tools: Dict[str, Callable],
        enable_tavily: Optional[bool] = True,
    ):
        """
        @class `Builde AgentExecutor based on tools and prompt`
        @param prompt: str
        @param tools: Dict[str, Callable]
        @param enable_tavily: Optional[bool] If set True, enables the Tavily tool
        """
        self.prompt = prompt
        self.tools = tools
        self.enable_tavily = enable_tavily
        self.chat_model = chat_model
        self.agent_executor = self._create_agent_with_tools()

    def init_tavily_tools(self):
        # init Tavily
        search = TavilySearchAPIWrapper()
        tavily_tool = TavilySearchResults(api_wrapper=search)
        return [tavily_tool]

    def _create_agent_with_tools(self) -> AgentExecutor:
        ...

    def get_prompt(self):
        ...

    def chat_history_transform(self, messages: list[Message]):
        ...

    async def run_stream_chat(self, input_data: ChatData) -> AsyncIterator[Dict]:
        ...
    async def run_chat(self, input_data: ChatData) -> str:
        ...

AI 大模型源码解析：打造 GitHub 智能答疑助手

AI 大模型源码解析：打造 GitHub 智能答疑助手

PeterCat 简介

源码解读

业界通用方案

PeterCat 源码结构

更多推荐文章

相关免费在线工具

Agent 工作流

RAG 实现细节

LLM 调用

Prompt 设定

Tools 工具预置

bot 构建

Langchain 集成

总结

更多推荐文章

相关免费在线工具

AI 大模型源码解析：打造 GitHub 智能答疑助手

AI 大模型源码解析：打造 GitHub 智能答疑助手

PeterCat 简介

源码解读

业界通用方案

PeterCat 源码结构

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

Agent 工作流

RAG 实现细节

LLM 调用

Prompt 设定

Tools 工具预置

bot 构建

Langchain 集成

总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具