基于 AutoGen 的 LLM 多智能体自动收集论文与生成报告实践

最近需要优化人脸姿态评估模型，往常我需要调研当前业界最新论文，在 arxiv 上查阅论文，然后到相关算法 benchmark 上查看排名，最后选定论文和模型。今天看到使用 AutoGen 自动获取数据并撰写分析报告的实验，于是突发奇想，我为什么不用 AutoGen 写一个根据我的需求自动调研最近 4 年人脸姿态评估论文并撰写一个报告给我呢？这样至少能节省不少时间，而且最终会输出一份中文报告。

1. 对话流程设计

要实现这样的任务，需要自动编码获取论文和摘要，然后根据获取到的论文摘要进行报告撰写。大致流程如下：

UserAgent 发送任务给 PlannerAgent
PlannerAgent 开始规划任务
ProgrammingAgent 通过编写程序获取规划任务中的信息并发送给 Code Executor
Code Executor 执行编码
如果程序运行出错，则反馈给 ProgrammingAgent，其根据反馈调整代码，再次给到 Code Executor
如果程序运行成功，则输出结果给到 WriterAgent
WriterAgent 根据给定信息开始撰写报告，并发送给 UserAgent 审核
如果审核通过，结束；如果审核失败，则反馈给 Writer 让其优化。

2. 对话实现

熟悉如何编写 llm_config 和实例化 ConversableAgent 的同学可以跳过此部分。其中 system prompt 较长有所删减。

user_proxy = autogen.ConversableAgent(
    name="Admin",
    system_message="Give the task, and send instructions to writer to refine the blog post.",
    code_execution_config=False,
    llm_config=llm_config,
    human_input_mode="ALWAYS",
)

planner = autogen.ConversableAgent(
    name="Planner",
    system_message="Given a task, please determine ...",
    description="Given...",
    llm_config=llm_config,
)

engineer = autogen.AssistantAgent(
    name="Engineer",
    llm_config=llm_config,
    description="Write code based on the plan provided by the planner.",
)

writer = autogen.ConversableAgent(
    name="Writer",
    llm_config=llm_config,
    system_message="Writer. Please write blogs in markdown format (with relevant titles)",
    description="After all ...",
)

executor = autogen.ConversableAgent(
    name="Executor",
    description="Execute the code written by the engineer and report the result.",
    human_input_mode="NEVER",
    code_execution_config={
        "last_n_messages": 3,
        "work_dir": "coding",
        "use_docker": False,
    },
)

我们采用 GroupChat 来管理对话。AutoGen 支持自定义对话目标 Agent，方法是通过设定 GroupChat 的参数。重点参数说明如下：

agents: List[Agent] 一组对话的 Agent
max_round: 最大允许对话的次数
speaker_selection_method: 默认是 auto 模式，由 LLM 自动根据 Agent 的描述选择
allowed_or_disallowed_speaker_transitions: Dict，话筒传递设定，key 为 source agent，value 是 List[Agent]，可传递或者禁止传递话筒的目标 Agent 列表
speaker_transitions_type: 设定上面话筒传递列表是允许传递还是禁止方向，allowed 或者 disallowed

按照设计的对话顺序，实例化 GroupChat 如下：

groupchat = autogen.GroupChat(
    agents=[user_proxy, engineer, writer, executor, planner],
    messages=[],
    max_round=10,
    allowed_or_disallowed_speaker_transitions={
        user_proxy: [writer, planner],
        engineer: [executor],
        writer: [user_proxy],
        executor: [engineer],
        planner: [engineer],
    },
    speaker_transitions_type="allowed",
)

群组组建好后，需要实例化群管理员 GroupChatManager。它继承自 ConversableAgent，包含 groupchat 和 name 等参数。

manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)
task = "使用 arxiv 获取 2020-2024 年期间所有人脸姿态识别的论文并写一篇报告"
groupchat_result = user_proxy.initiate_chat(manager, message=task)

3. 运行与问题

由于自动编码调试，无法输出稳定的结果。虽然它可以在 Engineer 和 Executor 之间不断调试代码，但是流程仍然是难以控制，无法稳定到输出结果给到 Writer 进行撰写报告。调试后发现，使用 LLM 自动选择 Agent 来发言是一个不太成熟的方法，对于 LLM 本身要求较高，直接设定流程会比较稳定一些。

4. 优化方案

在整个环节中，因为自动编码无法稳定输出获取的论文，可以考虑尝试自己编写获取 arXiv 论文的代码，这样能够稳定输出论文，从而充分利用大语言模型自动撰写报告。

4.1 arXiv 库使用

arXiv 是一个开放给所有人的精选研究共享平台。在 Python 中我们可以通过 arxiv 的 pip 包进行检索或者获取论文。arxiv 包较为简单，包含三个类型 Client、Search 和 Result。

首先安装依赖：

pip install arxiv

编写获取论文代码：

import arxiv
client = arxiv.Client()
# 设定检索条件
search = arxiv.Search(
  query="head pose estimation",
  max_results=10,
  sort_by=arxiv.SortCriterion.SubmittedDate
)
# 获取返回结果
paper = next(client.results(search))
print(paper.title, paper.summary)

检索论文就是如此方便。这里需要注意 client.results(search) 返回的是 Generator[Result, None, None]，因此需要使用 generator 来获取调用结果。此外，由于 arXiv 没有时间过滤功能，要么自己获取大量论文后手动按照时间过滤，这请求量会比较大，本文暂不考虑时间过滤。其次，考虑到论文的数量不宜过多，可能会超过 LLM 的 Context Window，因此限制数量大小默认为 10。

为了适配 LLM 函数调用，我们可以使用 Python typing 来注解这个函数：

from typing import TypedDict, Optional, List, Annotated

class Paper(TypedDict):
    title: str
    published: str
    summary: str

def search_arxiv(query: Annotated[str, "query string of arxiv"],
                 max_results: Annotated[Optional[int], "the max result from arxiv"] = 10) -> Annotated[List[Paper], "a List of paper contains paper's title, published and summary"]:
    import arxiv
    client = arxiv.Client()
    # 执行检索
    search = arxiv.Search(
        query=query,
        max_results=max_results,
        sort_by=arxiv.SortCriterion.Relevance
    )
    results = list(client.results(search))
    papers = []
    for result in results:
        papers.append(Paper(title=result.title, published=result.published.strftime("%Y-%M-%d"), summary=result.summary))
    return papers

4.2 编写对话流

在 4.1 中已经实现适合 LLM 函数调用的检索 arXiv 方法，在这一节我们将尝试使用 ReAct 式的流程来实现函数调用。ReAct 能够推导当前要做什么 Thought，Action 是什么，然后执行 Action，再把结果给到 Observation。

Prompt 模板如下：

ReAct_prompt = """
Answer the following questions as best you can. You have access to tools provided.
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take
Action Input: the input to the action
Observation: the result of the action
... (this process can repeat multiple times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
"""

实例化用户和助手 Agent：

user_proxy = UserProxyAgent(
    name="User",
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    human_input_mode="ALWAYS",
    max_consecutive_auto_reply=10,
    code_execution_config={"work_dir": "coding", "last_n_messages": 3, "use_docker": False},
)

assistant = AssistantAgent(
    name="Assistant",
    system_message="Only use the tools you have been provided with. Reply TERMINATE when the task is done.",
    llm_config=llm_config,
)

将函数注册到两个 Agent 上：

register_function(
    search_arxiv,
    caller=assistant,
    executor=user_proxy,
    name="search_arxiv",
    description="Search the arxiv for the given query to get the paper",
)

最后调用对话，其中自定义了 message 使用 react_prompt_message 方法构造 Prompt 输入。

def react_prompt_message(sender, recipient, context):
    return ReAct_prompt.format(input=context["question"])

task = "使用 arxiv 包获取 2020-2024 年期间所有头部姿态识别的论文并撰写一篇报告"
papers = user_proxy.initiate_chat(assistant, message=react_prompt_message, question=task)

4.3 运行效果

LLM 给出推导 Thought 和 Action 以及 Action 的输入，并且字段 tools 告知 AutoGen 需要调用工具 search_arxiv 并且给出了参数。调用 search_arxiv 函数返回论文信息。由于返回太多，考虑阅读体验，做了删减。除非 LLM 的 Token 限制比较大，否则这里最好是给他反馈将参数调小，不然很容易报 502。

从输入的消息来看，输入是包括了历史的输入和输出，因此最后的输出依然是重新推理要做的事情，并没有因为输入较多，导致 LLM 丢失要做的事情，他依然推导出下一步需要撰写报告。

5. 总结与建议

本文尝试从查找论文入手，获取关于头部姿态评估的研究，并使用 AutoGen 自动编码来实现数据收集和报告撰写。我们设计了包括规划、工程、执行和写作等多个 Agent，并通过群聊和自动选择 Agent 的应答和转换来协调工作。然而，发现该流程过于动态，对 LLM 的要求较高，失败几率较大。

最终，我们决定编写检索论文的代码，并采用 ReAct Prompt 范式来完成报告撰写。尽管 ReAct 在任务执行上高效，但也显示出 AssistantAgent 承担了过多职责。它的设定应尽量简单。我们应该坚持专人专事的原则，分别使用不同的 Agent 来获取 arXiv 论文和撰写文档。这种做法允许我们对撰写文档的 Agent 设定更专业的 Prompt，从而提高工作效率和专业性。

在实际部署中，建议注意以下几点：

稳定性优先：动态 Agent 路由容易导致死循环或任务偏离，固定流程（如 ReAct）更适合确定性任务。
上下文管理：论文检索结果可能占用大量 Token，需合理设置 max_results 或使用 RAG 技术处理长文本。
错误处理：代码执行环境应配置沙箱（Docker），防止恶意代码执行，同时增加重试机制应对网络波动。
Prompt 工程：针对 Writer Agent 的 Prompt 应明确格式要求，确保输出符合预期结构。

通过上述优化，可以实现一个相对稳定的自动化科研辅助系统，大幅减少人工检索和整理文献的时间成本。