高级 RAG 技术全解析：优化检索增强生成的最佳实践 | 极客日志

PythonAI算法

高级 RAG 技术全解析：优化检索增强生成的最佳实践

综述由AI生成高级检索增强生成（RAG）技术通过多种策略优化检索与生成效果。涵盖基础 RAG 的局限性，介绍混合检索、重排序等进阶方法。重点讲解查询扩展、查询分解及多模态 RAG（MM-RAG）的实现细节与代码实验。此外还列举了索引改进、检索优化及流水线覆盖的其他高级技术，如深度分块、RAPTOR、HyDE 等，为构建高精度 RAG 应用提供实践指导与技术选型参考。

灰度发布发布于 2025/2/6更新于 2026/6/119 浏览

在本章的最后，我们将探讨几种高级技术，以提升检索增强生成（RAG）应用程序的效果。这些技术超越了基础的 RAG 方法，旨在解决更复杂的挑战并实现更好的结果。我们的出发点是我们在之前章节中使用过的技术。我们将基于这些技术，了解它们的局限性，然后引入新的技术来弥补这些不足，从而将您的 RAG 实践提升到新的水平。

在本章中，您将通过一系列代码实验，亲自体验如何实施这些高级技术。我们的主题将包括以下内容：

基础 RAG 及其局限性
混合 RAG/多向量 RAG 提升检索效果
混合 RAG 中的重排序
代码实验 14.1 – 查询扩展
代码实验 14.2 – 查询分解
代码实验 14.3 – 多模态 RAG (MM-RAG)
其他值得探索的高级 RAG 技术

这些技术通过增强查询、将问题拆解为子问题以及结合多种数据模态来增强检索和生成能力。我们还将讨论一系列其他的高级 RAG 技术，涉及索引、检索、生成及整个 RAG 流水线的优化。我们将从基础 RAG 讨论开始，这是我们在第二章中回顾过的主要 RAG 方法，您现在应该已经非常熟悉了。

技术要求

本章的代码可以在 GitHub 仓库中找到。

基础 RAG 及其局限性

到目前为止，我们已经使用了三种类型的 RAG 方法：基础 RAG、混合 RAG 和重排序。最初，我们使用的是所谓的基础 RAG。这是我们在第二章的启动代码中使用的基本 RAG 方法，也是之后多个代码实验中使用的基础。基础 RAG 模型是 RAG 技术的初始版本，提供了将检索机制与生成模型结合的基础框架，尽管它在灵活性和可扩展性方面存在一些局限。

基础 RAG 检索了许多碎片化的上下文块，这些是我们将文本向量化后放入 LLM 上下文窗口的块。如果你使用的文本块不够大，所得到的上下文就会更碎片化。这种碎片化会导致上下文和语义的理解和捕捉下降，从而降低 RAG 应用中检索机制的有效性。在典型的基础 RAG 应用中，你通常使用某种类型的语义搜索，因此仅依赖这种搜索类型会暴露这些局限性。因此，我们引入了更先进的检索方法：混合检索。

混合 RAG/多向量 RAG 提升检索效果

混合 RAG 扩展了基础 RAG 的概念，通过利用多个向量进行检索，而不是仅仅依赖于查询和文档的单一向量表示。在第 8 章中，我们深入探讨了混合 RAG，并通过代码实现了这一方法，不仅使用了 LangChain 中推荐的机制，还自己重新创建了这个机制，以便了解其内部工作原理。混合 RAG，也叫做多向量 RAG，可以不仅仅涉及语义和关键词搜索（如我们在代码实验中所见），而是可以结合多种不同的向量检索技术，这些技术在你的 RAG 应用中都有意义。

我们的混合 RAG 代码实验引入了关键词搜索，扩展了我们的搜索能力，尤其是在处理上下文较弱的内容时（如姓名、代码、内部缩写等类似文本）。这种多向量方法让我们可以考虑查询和数据库内容的更广泛方面。这反过来可以提高检索结果的相关性和准确性，从而支持生成过程。这使得生成的内容不仅更相关和信息丰富，而且更符合输入查询的细节。多向量 RAG 在需要高精度和细致度的生成内容应用中尤其有用，比如技术写作、学术研究辅助、包含大量内部代码和实体引用的公司内部文档，以及复杂的问答系统。

但多向量 RAG 并不是我们在第 8 章中探讨的唯一高级技术；我们还应用了重排序。

混合 RAG 中的重排序

在第 8 章中，除了混合 RAG 方法，我们还引入了一种重排序方法，这也是一种常见的高级 RAG 技术。在语义搜索和关键词搜索完成检索后，我们根据两者的排名情况对结果进行重排序，取决于它们是否出现在两者中，以及它们最初的排名。

现在，您已经了解了三种 RAG 技术，其中包括两种高级技术！但是本章的重点是带给你三种新的高级方法：查询扩展、查询分解和 MM-RAG。我们还将提供许多其他方法供你探索，但我们筛选并挑选了这三种高级 RAG 技术，因为它们在各种 RAG 应用中具有广泛的应用场景。

在本章的第一个代码实验中，我们将讨论查询扩展。

代码实验 14.1 – 查询扩展

本实验的代码可以在 GitHub 仓库的 CHAPTER14 目录中的 CHAPTER14-1_QUERY_EXPANSION.ipynb 文件中找到。

许多用于增强 RAG 的技术集中在提升某一方面，比如检索或生成，但查询扩展有潜力同时改进这两个方面。我们在第 13 章已经讨论过扩展的概念，但当时我们关注的是 LLM 输出。在这里，我们将这个概念聚焦于模型的输入，通过添加额外的关键词或短语来扩展原始提示。这种方法可以提升检索模型的理解能力，因为它为用户查询提供了更多上下文信息，从而提高了获取相关文档的可能性。通过改进检索，你已经在帮助改进生成过程，为生成提供了更好的上下文，但这种方法也有可能产生更有效的查询，从而帮助 LLM 提供更好的回应。

通常，查询扩展和答案的工作原理是：你将用户查询传递给 LLM，并通过一个提示来获取问题的初步答案，尽管此时你并未展示 LLM 通常在 RAG 应用中看到的上下文。从 LLM 的角度来看，这种类型的变化有助于扩大搜索范围，同时保持对原始意图的关注。

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

from langchain.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate

"You are a helpful expert environmental research assistant. Provide an example answer to the given question, that might be found in a document like an annual environmental report."

def augment_query_generated(user_query):
    system_message_prompt = SystemMessagePromptTemplate.from_template(
        "You are a helpful expert environmental research assistant. Provide an example answer to the given question, that might be found in a document like an annual environmental report."
    )
    human_message_prompt = HumanMessagePromptTemplate.from_template("{query}")
    chat_prompt = ChatPromptTemplate.from_messages([
        system_message_prompt, human_message_prompt])
    response = chat_prompt.format_prompt(
        query=user_query).to_messages()
    result = llm(response)
    content = result.content
    return content

original_query = "What are Google's environmental initiatives?"
hypothetical_answer = augment_query_generated(
    original_query)
joint_query = f"{original_query} {hypothetical_answer}"
print(joint_query)

What are Google's environmental initiatives?
In 2022, Google continued to advance its environmental initiatives, focusing on sustainability and reducing its carbon footprint. Key initiatives included:
1. **Carbon Neutrality and Renewable Energy**: Google has maintained its carbon-neutral status since 2007 and aims to operate on 24/7 carbon-free energy by 2030. In 2022, Google procured over 7 gigawatts of renewable energy, making significant strides towards this goal.
2. **Data Center Efficiency**: Google's data centers are among the most energy-efficient in the world. In 2022, the company achieved an average power usage effectiveness (PUE) of 1.10, significantly lower than the industry average. This was accomplished through advanced cooling technologies and AI-driven energy management systems.
3. **Sustainable Products and Services**…[TRUNCATED]

result_alt = rag_chain_with_source.invoke(joint_query)
retrieved_docs_alt = result_alt['context']
print(f"Original Question: {joint_query}\n")
print(f"Relevance Score:\n    {result_alt['answer']['relevance_score']}\n")
print(f"Final Answer:\n{\n    result_alt['answer']['final_answer']}\n\n")
print("Retrieved Documents:")
for i, doc in enumerate(retrieved_docs_alt, start=1):
    print(f"Document {i}: Document ID:\n        {doc.metadata['id']} source:\n        {doc.metadata['search_source']}")
    print(f"Content:\n{doc.page_content}\n")

from IPython.display import Markdown, display
markdown_text_alt = result_alt['answer']['final_answer']
display(Markdown(markdown_text_alt))

Google has implemented a comprehensive set of environmental initiatives aimed at sustainability and reducing its carbon footprint. Here are the key initiatives:
1. Carbon Neutrality and Renewable Energy: Google has been carbon-neutral since 2007 and aims to operate on 24/7 carbon-free energy by 2030. In 2022, Google procured over 7 gigawatts of renewable energy.
2. Data Center Efficiency: Google's data centers are among the most energy-efficient globally, achieving an average power usage effectiveness (PUE) of 1.10 in 2022. This was achieved through advanced cooling technologies and AI-driven energy management systems.
…[TRUNCATED FOR BREVITY]
3. Supplier Engagement: Google works with its suppliers to build an energy-efficient, low-carbon, circular supply chain, focusing on improving environmental performance and integrating sustainability principles.
4. Technological Innovations: Google is investing in breakthrough technologies, such as next-generation geothermal power and battery-based backup power systems, to optimize the carbon footprint of its operations.
These initiatives reflect Google's commitment to sustainability and its role in addressing global environmental challenges. The company continues to innovate and collaborate to create a more sustainable future.
---- END OF OUTPUT ----

from langchain.load import dumps, loads

prompt_decompose = PromptTemplate.from_template(
    """You are an AI language model assistant.
    Your task is to generate five different versions of
    the given user query to retrieve relevant documents from
    a vector search. By generating multiple perspectives on
    the user question, your goal is to help the user
    overcome some of the limitations of the distance-based
    similarity search.  Provide these alternative questions
    separated by newlines.
    Original question: {question}"""
)

decompose_queries_chain = (
    prompt_decompose
    | llm
    | str_output_parser
    | (lambda x: x.split("\n"))
)

decomposed_queries = decompose_queries_chain.invoke(
    {"question": user_query})
print("Five different versions of the user query:")
print(f"Original: {user_query}")
for i, question in enumerate(decomposed_queries, start=1):
    print(f"{question.strip()}")

Five different versions of the user query:
Original: What are Google's environmental initiatives?
What steps is Google taking to address environmental concerns?
How is Google contributing to environmental sustainability?
Can you list the environmental programs and projects Google is involved in?
What actions has Google implemented to reduce its environmental impact?
What are the key environmental strategies and goals of Google?

def format_retrieved_docs(documents: list[list]):
    flattened_docs = [dumps(doc) for sublist in documents
        for doc in sublist]
    print(f"FLATTENED DOCS: {len(flattened_docs)}")
    deduped_docs = list(set(flattened_docs))
    print(f"DEDUPED DOCS: {len(deduped_docs)}")
    return [loads(doc) for doc in deduped_docs]

FLATTENED DOCS: 100
DEDUPED DOCS: 67

retrieval_chain = (
    decompose_queries_chain
    | ensemble_retriever.map()
    | format_retrieved_docs
)

docs = retrieval_chain.invoke({"question": user_query})

rag_chain_with_source = RunnableParallel(
    {"context": retrieval_chain,
     "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

Google has implemented a wide range of environmental initiatives aimed at improving sustainability and reducing its environmental impact. Here are some key initiatives based on the provided context.
1. Campus and Habitat Restoration:
Google has created and restored more than 40 acres of habitat on its campuses and surrounding urban landscapes, primarily in the Bay Area. This includes planting roughly 4,000 native trees and restoring ecosystems like oak woodlands, willow groves, and wetland habitats.
2. Carbon-Free Energy:
Google is working towards achieving net-zero emissions and 24/7 carbon-free energy (CFE) by 2030. This involves clean energy procurement strategies such as reducing carbon emissions across its operations and supply chain. Google has also entered into long-term renewable energy contracts to ensure 100% of its data centers are powered by renewable energy sources.
3. Greener Workplaces and Facilities:
Google is working to make its workplaces and facilities more sustainable by installing features such as energy-efficient buildings, reduced carbon footprints, and better waste management practices. For instance, Google's Mountain View campus uses a blend of natural and renewable energy sources to reduce its environmental impact and improve energy efficiency.

%pip install "unstructured[pdf]"
%pip install pillow
%pip install pydantic
%pip install lxml
%pip install matplotlib
%pip install tiktoken
!sudo apt-get -y install poppler-utils
!sudo apt-get -y install tesseract-ocr

from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_core.runnables import RunnableLambda
from langchain.storage import InMemoryStore
from langchain_core.messages import HumanMessage
import base64
import uuid
from IPython.display import HTML, display
from PIL import Image
import matplotlib.pyplot as plt

MultiVectorRetriever from langchain.retrievers.multi_vector：MultiVectorRetriever 是一个结合多个向量存储的检索器，允许基于相似性搜索高效地检索文档。在我们的代码中，MultiVectorRetriever 被用来创建一个结合了 vectorstore 和 docstore 的检索器，用于根据用户查询检索相关文档。
UnstructuredPDFLoader from langchain_community.document_loaders：UnstructuredPDFLoader 是一个文档加载器，用于使用 unstructured 库从 PDF 文件中提取元素，包括文本和图像。在我们的代码中，UnstructuredPDFLoader 用于加载和提取指定 PDF 文件（short_pdf_path）中的元素。
RunnableLambda from langchain_core.runnables：RunnableLambda 类是一个实用工具类，允许将函数包装为 LangChain 管道中的可执行组件。在我们的代码中，RunnableLambda 用于将 split_image_text_types 和 img_prompt_func 函数包装为 RAG 链中的可执行组件。
InMemoryStore from langchain.storage：InMemoryStore 类是一个简单的内存存储类，用于存储键值对。在我们的代码中，InMemoryStore 用作文档存储，存储与每个文档 ID 相关的实际文档内容。
HumanMessage from langchain_core.messages：HumanMessage 是一个表示用户向语言模型发送消息的提示类型。在这个代码实验中，HumanMessage 用于构建图像摘要和描述的提示消息。
base64：用于将图像编码为 base64 字符串进行存储和检索。
uuid：用于生成唯一标识符（UUID），在代码中用于为添加到 vectorstore 和 docstore 中的文档生成唯一的文档 ID。
HTML 和 display from IPython.display：HTML 用于创建对象的 HTML 表示，display 用于在 IPython notebook 中显示对象。在我们的代码中，HTML 和 display 用于在 plt_img_base64 函数中显示 base64 编码的图像。
Image from PIL：PIL 提供了打开、处理和保存多种图像文件格式的功能。
matplotlib.pyplot as plt：matplotlib 是一个绘图库，提供了用于创建可视化图表的功能。尽管在代码中 plt 并未直接使用，但可能在其他库或函数中隐式使用。

llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

short_pdf_path = "google-2023-environmental-report-short.pdf"

embedding_function = OpenAIEmbeddings()

pdfloader = UnstructuredPDFLoader(
    short_pdf_path,
    mode="elements",
    strategy="hi_res",
    extract_image_block_types=["Image", "Table"],
    extract_image_block_to_payload=True,
)
pdf_data = pdfloader.load()

TOTAL DOCS USED BEFORE REDUCTION: texts: 78 images: 17
CATEGORIES REPRESENTED: {'ListItem', 'Title', 'Footer', 'Image', 'Table', 'NarrativeText', 'FigureCaption', 'Header', 'UncategorizedText'}

if len(images) > 3:
    images = images[:3]
print(f"total documents after reduction: texts: {len(texts)} images: {len(images)}")

total documents after reduction: texts: 78 images: 3

def apply_prompt(img_base64):
    # Prompt
    prompt = """You are an assistant tasked with summarizing images for retrieval. \
        These summaries will be embedded and used to retrieve the raw image. \
        Give a concise summary of the image that is well optimized for retrieval."""
    return [HumanMessage(content=[
            {"type": "text", "text": prompt},
            {"type": "image_url", "image_url": {"url":
                    f"data:image/jpeg;base64,{img_base64}"},},
    ])]

text_summaries = [doc.page_content for doc in texts]
# 存储 base64 编码的图像和图像总结
img_base64_list = []
image_summaries = []
for img_doc in images:
    base64_image = img_doc.metadata["image_base64"]
    img_base64_list.append(base64_image)
    message = llm.invoke(apply_prompt(base64_image))
    image_summaries.append(message.content)

vectorstore = Chroma(
    collection_name="mm_rag_google_environmental",
    embedding_function=embedding_function
)

store = InMemoryStore()
id_key = "doc_id"
retriever_multi_vector = MultiVectorRetriever(
    vectorstore=vectorstore,
    docstore=store,
    id_key=id_key,
)

def add_documents(retriever, doc_summaries, doc_contents):
    doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
    summary_docs = [
        Document(page_content=s, metadata={id_key: doc_ids[i]})
        for i, s in enumerate(doc_summaries)
    ]
    content_docs = [        Document(page_content=doc.page_content,        metadata={id_key: doc_ids[i]})
        for i, doc in enumerate(doc_contents)
    ]
    retriever.vectorstore.add_documents(summary_docs)
    retriever.docstore.mset(list(zip(doc_ids, doc_contents)))

if text_summaries:
    add_documents(retriever_multi_vector, text_summaries, texts)
if image_summaries:
    add_documents(retriever_multi_vector, image_summaries, images)

def split_image_text_types(docs):
    b64_images = []
    texts = []
    for doc in docs:
        if isinstance(doc, Document):
            if doc.metadata.get("category") == "Image":
                base64_image = doc.metadata["image_base64"]
                b64_images.append(base64_image)
            else:
                texts.append(doc.page_content)
        else:
            if isinstance(doc, str):
                texts.append(doc)
    return {"images": b64_images, "texts": texts}

def img_prompt_func(data_dict):
    formatted_texts = "\n".join(data_dict["context"]["texts"])
    messages = []
    if data_dict["context"]["images"]:
        for image in data_dict["context"]["images"]:
            image_message = {"type": "image_url",
                             "image_url": {"url": f"data:image/jpeg;base64,{image}"}}
            messages.append(image_message)
    text_message = {
        "type": "text",
        "text": (
            f"""You are a helpful assistant tasked with describing what is in an image. The user will ask for a picture of something. Provide text that supports what was asked for. Use this information to provide an in-depth description of the aesthetics of the image. Be clear and concise and don't offer any additional commentary.
User-provided question: {data_dict['question']}
Text and / or images: {formatted_texts}"""
        ),
    }
    messages.append(text_message)
    return [HumanMessage(content=messages)]

chain_multimodal_rag = ({"context": retriever_multi_vector
    | RunnableLambda(split_image_text_types),
    "question": RunnablePassthrough()}
    | RunnableLambda(img_prompt_func)
    | llm
    | str_output_parser
)

user_query = "Picture of multiple wind turbines in the ocean."
chain_multimodal_rag.invoke(user_query)

'The image shows a vast array of wind turbines situated in the ocean, extending towards the horizon. The turbines are evenly spaced and stand tall above the water, with their large blades capturing the wind to generate clean energy. The ocean is calm and blue, providing a serene backdrop to the white turbines. The sky above is clear with a few scattered clouds, adding to the tranquil and expansive feel of the scene. The overall aesthetic is one of modernity and sustainability, highlighting the use of renewable energy sources in a natural setting.'

def plt_img_base64(img_base64):
    image_html = f'<img src="data:image/jpeg;base64,{img_base64}" />'
    display(HTML(image_html))
plt_img_base64(img_base64_list[1])

image_summaries[1]

'Offshore wind farm with multiple wind turbines in the ocean, text "What's inside" on the left side.'

深度分块（Deep Chunking）：检索结果的质量通常依赖于在数据存储到检索系统之前，数据的分块方式。通过深度分块，您可以使用深度学习模型，包括变换器（transformers），来进行最佳且智能的分块。
训练和利用嵌入适配器（Embedding Adapters）：嵌入适配器是轻量级模块，经过训练后可将现有的语言模型嵌入调整为特定任务或领域的需求，无需进行大量重新训练。应用到 RAG 系统时，这些适配器可以根据提示的细微差别，定制模型的理解和生成能力，从而促进更准确和相关的检索。
多表示索引（Multi-representation Indexing）：命题索引使用 LLM 生成文档摘要（命题），这些摘要经过优化，适合用于检索。
递归抽象处理树组织检索（RAPTOR）：RAG 系统需要处理'低层次'问题，这些问题涉及从单一文档中提取的具体事实，或者'高层次'问题，这些问题提炼出跨多个文档的观点。典型的 kNN 检索方式只能检索有限数量的文档块，因此很难同时应对这两类问题。RAPTOR 通过创建捕捉更高层次概念的文档摘要来解决这一问题。它将文档进行嵌入并聚类，然后对每个集群进行摘要处理。这个过程是递归进行的，生成一个摘要树，每次递归都产生更高层次的概念。最终，这些摘要和起始文档一起被索引，确保覆盖用户的问题。
基于 BERT 的上下文延迟交互（ColBERT）：嵌入模型将文本压缩成固定长度（向量）表示，捕捉文档的语义内容。虽然这种压缩在高效的检索中非常有用，但它对单一向量表示承担了很大压力，要求它能够捕捉所有语义的细微差别。在某些情况下，无关或冗余的内容可能会稀释嵌入的语义价值。ColBERT 提供了一种方法，通过更细粒度的嵌入，专注于在文档和查询之间进行更细致的按令牌（token）相似性评估，从而解决这个问题。

高级 RAG 技术全解析：优化检索增强生成的最佳实践

技术要求

基础 RAG 及其局限性

混合 RAG/多向量 RAG 提升检索效果

混合 RAG 中的重排序

代码实验 14.1 – 查询扩展

更多推荐文章

相关免费在线工具

代码实验室 14.2 – 查询分解

代码实验室 14.3 – MM-RAG

多模态

多模态的好处

多模态向量嵌入

图像不仅仅是'图片'

引入 MM-RAG 代码示例

其他值得探索的高级 RAG 技术

索引改进

检索

检索后/生成阶段

整个 RAG 流水线覆盖

总结

更多推荐文章

相关免费在线工具

高级 RAG 技术全解析：优化检索增强生成的最佳实践

技术要求

基础 RAG 及其局限性

混合 RAG/多向量 RAG 提升检索效果

混合 RAG 中的重排序

代码实验 14.1 – 查询扩展

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

代码实验室 14.2 – 查询分解

代码实验室 14.3 – MM-RAG

多模态

多模态的好处

多模态向量嵌入

图像不仅仅是'图片'

引入 MM-RAG 代码示例

其他值得探索的高级 RAG 技术

索引改进

检索

检索后/生成阶段

整个 RAG 流水线覆盖

总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具