LangChain 实战指南：9 个典型应用场景 | 极客日志

PythonAI算法

LangChain 实战指南：9 个典型应用场景

LangChain 提供了构建 LLM 应用的核心工具。通过 9 个范例演示其核心功能，包括文本总结、文档问答、信息抽取、结果评估、数据库查询、代码理解、API 交互、聊天机器人及智能体。内容涵盖短长文本处理、向量检索、结构化输出解析及 Agent 工具调用，帮助开发者快速掌握 LangChain 在自然语言处理与自动化任务中的实际应用。

FlinkHero发布于 2025/2/7更新于 2026/7/2345 浏览

LangChain 实战指南：9 个典型应用场景

本文通过演示 9 个具有代表性的应用范例，带你零基础入门 LangChain。

9 个范例功能列表

文本总结 (Summarization): 对文本/聊天内容的重点内容总结。
文档问答 (Question and Answering Over Documents): 使用文档作为上下文信息，基于文档内容进行问答。
信息抽取 (Extraction): 从文本内容中抽取结构化的内容。
结果评估 (Evaluation): 分析并评估 LLM 输出的结果的好坏。
数据库问答 (Querying Tabular Data): 从数据库/类数据库内容中抽取数据信息。
代码理解 (Code Understanding): 分析代码，并从代码中获取逻辑，同时也支持 QA。
API 交互 (Interacting with APIs): 通过对 API 文档的阅读，理解 API 文档并向真实世界调用 API 获取真实数据。
聊天机器人 (Chatbots): 具备记忆能力的聊天机器人框架（有 UI 交互能力）。
智能体 (Agents): 使用 LLMs 进行任务分析和决策，并调用工具执行决策。

# 安装需要的依赖
!pip install langchain
!pip install openai
!pip install tiktoken 
!pip install faiss-cpu

openai_api_key = 'YOUR_API_KEY'
# 使用你自己的 OpenAI API key

一、文本总结 (Summarization)

将一段文本扔给 LLM，让它生成总结可以说是最常见的场景之一了。

1. 短文本总结

# Summaries Of Short Text
from langchain.llms import OpenAI
from langchain import PromptTemplate

llm = OpenAI(temperature=0, model_name='gpt-3.5-turbo', openai_api_key=openai_api_key) # 初始化 LLM 模型

# 创建模板
template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:
{text}
"""

# 创建一个 Lang Chain Prompt 模板，稍后可以插入值
prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)

confusing_text = """
For the next 130 years, debate raged.
Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.
'The problem is that when you look up close at the anatomy, it's evocative of a lot of different things, but it's diagnostic of nothing,' says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.
'And it's so damn big that when whenever someone says it's something, everyone else's hackles get up: 'How could you have a lichen 20 feet tall?''
"""

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

print("------- Prompt Begin -------")
# 打印模板内容
final_prompt = prompt.format(text=confusing_text)
print(final_prompt)

print("------- Prompt End -------")

output = llm(final_prompt)
print(output)

# Summaries Of Longer Text
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

with open('wonderland.txt', 'r') as file:
    text = file.read() # 文章本身是爱丽丝梦游仙境

# 打印小说的前 285 个字符
print(text[:285])

num_tokens = llm.get_num_tokens(text)

print(f"There are {num_tokens} tokens in your file") 
# 全文一共 4w8 词
# 很明显这样的文本量是无法直接送进 LLM 进行处理和生成的

text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=5000, chunk_overlap=350)
# 虽然我使用的是 RecursiveCharacterTextSplitter，但是你也可以使用其他工具
docs = text_splitter.create_documents([text])

print(f"You now have {len(docs)} docs instead of 1 piece of text")

# 设置 lang chain
# 使用 map_reduce 的 chain_type，这样可以将多个文档合并成一个
chain = load_summarize_chain(llm=llm, chain_type='map_reduce') # verbose=True 展示运行日志

# Use it. This will run through the 36 documents, summarize the chunks, then get a summary of the summary.
# 典型的 map reduce 的思路去解决问题，将文章拆分成多个部分，再将多个部分分别进行 summarize，最后再进行 合并，对 summaries 进行 summary
output = chain.run(docs)
print(output)
# Try yourself

# 概括来说，使用文档作为上下文进行 QA 系统的构建过程类似于 llm(your context + your question) = your answer
# Simple Q&A Example
from langchain.llms import OpenAI

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

context = """
Rachel is 30 years old
Bob is 45 years old
Kevin is 65 years old
"""

question = "Who is under 40 years old?"

output = llm(context + question)

print(output.strip())

from langchain import OpenAI
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

loader = TextLoader('wonderland.txt') # 载入一个长文本，我们还是使用爱丽丝漫游仙境这篇小说作为输入
doc = loader.load()
print(f"You have {len(doc)} document")
print(f"You have {len(doc[0].page_content)} characters in that document")

# 将小说分割成多个部分
text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
docs = text_splitter.split_documents(doc)

# 获取字符的总数，以便可以计算平均值
num_total_characters = sum([len(x.page_content) for x in docs])

print(f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")

# 设置 embedding 引擎
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

# Embed 文档，然后使用伪数据库将文档和原始文本结合起来
# 这一步会向 OpenAI 发起 API 请求
docsearch = FAISS.from_documents(docs, embeddings)

# 创建 QA-retrieval chain
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

query = "What does the author describe the Alice following with?"
qa.run(query)
# 这个过程中，检索器会去获取类似的文件部分，并结合你的问题让 LLM 进行推理，最后得到答案
# 这一步还有很多可以细究的步骤，比如如何选择最佳的分割大小，如何选择最佳的 embedding 引擎，如何选择最佳的检索器等等
# 同时也可以选择云端向量存储

from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate

from langchain.chat_models import ChatOpenAI


chat_model = ChatOpenAI(temperature=0, model='gpt-3.5-turbo', openai_api_key=openai_api_key)

# Vanilla Extraction
instructions = """
You will be given a sentence with fruit names, extract those fruit names and assign an emoji to them
Return the fruit name and emojis in a python dictionary
"""

fruit_names = """
Apple, Pear, this is an kiwi
"""

# Make your prompt which combines the instructions w/ the fruit names
prompt = (instructions + fruit_names)

# Call the LLM
output = chat_model([HumanMessage(content=prompt)])

print(output.content)
print(type(output.content))

output_dict = eval(output.content) #利用 python 中的 eval 函数手动转换格式

print(output_dict)
print(type(output_dict))

# 解析输出并获取结构化的数据
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

response_schemas = [
    ResponseSchema(name="artist", description="The name of the musical artist"),
    ResponseSchema(name="song", description="The name of the song that the artist plays")
]

# 解析器将会把 LLM 的输出使用我定义的 schema 进行解析并返回期待的结构数据给我
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

format_instructions = output_parser.get_format_instructions()
print(format_instructions)

# 这个 Prompt 与之前我们构建 Chat Model 时 Prompt 不同
# 这个 Prompt 是一个 ChatPromptTemplate，它会自动将我们的输出转化为 python 对象
prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template("Given a command from the user, extract the artist and song names \n \n                                                    {format_instructions}\n{user_prompt}")  
    ],
    input_variables=["user_prompt"],
    partial_variables={"format_instructions": format_instructions}
)

artist_query = prompt.format_prompt(user_prompt="I really like So Young by Portugal. The Man")
print(artist_query.messages[0].content)

artist_output = chat_model(artist_query.to_messages())
output = output_parser.parse(artist_output.content)

print(output)
print(type(output))
# 这里要注意的是，因为我们使用的 turbo 模型，生成的结果并不一定是每次都一致的
# 替换成 gpt4 模型可能是更好的选择

# Embeddings, store, and retrieval
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

# Model and doc loader
from langchain import OpenAI
from langchain.document_loaders import TextLoader

# Eval
from langchain.evaluation.qa import QAEvalChain

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

# 还是使用爱丽丝漫游仙境作为文本输入
loader = TextLoader('wonderland.txt')
doc = loader.load()

print(f"You have {len(doc)} document")
print(f"You have {len(doc[0].page_content)} characters in that document")

text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
docs = text_splitter.split_documents(doc)

# Get the total number of characters so we can see the average later
num_total_characters = sum([len(x.page_content) for x in docs])

print(f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")

# Embeddings and docstore
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
docsearch = FAISS.from_documents(docs, embeddings)

chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever(), input_key="question")
# 注意这里的 input_key 参数，这个参数告诉了 chain 我的问题在字典中的哪个 key 里
# 这样 chain 就会自动去找到问题并将其传递给 LLM

question_answers = [
    {'question' : "Which animal give alice a instruction?", 'answer' : 'rabbit'},
    {'question' : "What is the author of the book", 'answer' : 'Elon Mask'}
]

predictions = chain.apply(question_answers)
predictions
# 使用 LLM 模型进行预测，并将答案与我提供的答案进行比较，这里信任我自己提供的人工答案是正确的

# Start your eval chain
eval_chain = QAEvalChain.from_llm(llm)

graded_outputs = eval_chain.evaluate(question_answers,
                                     predictions,
                                     question_key="question",
                                     prediction_key="result",
                                     answer_key='answer')

graded_outputs

# 使用自然语言查询一个 SQLite 数据库，我们将使用旧金山树木数据集
# Don't run following code if you don't run sqlite and follow db
from langchain import OpenAI, SQLDatabase, SQLDatabaseChain

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

sqlite_db_path = 'data/San_Francisco_Trees.db'
db = SQLDatabase.from_uri(f"sqlite:///{sqlite_db_path}")

db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)

db_chain.run("How many Species of trees are there in San Francisco?")

import sqlite3
import pandas as pd

# Connect to the SQLite database
connection = sqlite3.connect(sqlite_db_path)

# Define your SQL query
query = "SELECT count(distinct qSpecies) FROM SFTrees"

# Read the SQL query into a Pandas DataFrame
df = pd.read_sql_query(query, connection)

# Close the connection
connection.close()

# Display the result in the first column first cell
print(df.iloc[0,0])

# Helper to read local files
import os

# Vector Support
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

# Model and chain
from langchain.chat_models import ChatOpenAI

# Text splitters
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader

llm = ChatOpenAI(model='gpt-3.5-turbo', openai_api_key=openai_api_key)

embeddings = OpenAIEmbeddings(disallowed_special=(), openai_api_key=openai_api_key)

root_dir = '/content/drive/MyDrive/thefuzz-master'
docs = []

# Go through each folder
for dirpath, dirnames, filenames in os.walk(root_dir):
    
    # Go through each file
    for file in filenames:
        try: 
            # Load up the file as a doc and split
            loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')
            docs.extend(loader.load_and_split())
        except Exception as e: 
            pass

print(f"You have {len(docs)} documents\n")
print("------ Start Document ------")
print(docs[0].page_content[:300])


```python
docsearch = FAISS.from_documents(docs, embeddings)

# Get our retriever ready
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

query = "What function do I use if I want to find the most similar item in a list of items?"
output = qa.run(query)

print(output)

query = "Can you write the code to use the process.extractOne() function? Only respond with code. No other text or explanation"
output = qa.run(query)
print(output)

from langchain.chains import APIChain
from langchain.llms import OpenAI

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

api_docs = """

BASE URL: https://restcountries.com/

API Documentation:

The API endpoint /v3.1/name/{name} Used to find informatin about a country. All URL parameters are listed below:
    - name: Name of country - Ex: italy, france
    
The API endpoint /v3.1/currency/{currency} Uesd to find information about a region. All URL parameters are listed below:
    - currency: 3 letter currency. Example: USD, COP
    
Woo! This is my documentation
"""

chain_new = APIChain.from_llm_and_api_docs(llm, api_docs, verbose=True)

chain_new.run('Can you tell me information about france?')

chain_new.run('Can you tell me about the currency COP?')

from langchain.llms import OpenAI
from langchain import LLMChain
from langchain.prompts.prompt import PromptTemplate

# Chat specific components
from langchain.memory import ConversationBufferMemory

template = """
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"], 
    template=template
)
memory = ConversationBufferMemory(memory_key="chat_history")

llm_chain = LLMChain(
    llm=OpenAI(openai_api_key=openai_api_key), 
    prompt=prompt, 
    verbose=True, 
    memory=memory
)

llm_chain.predict(human_input="Is an pear a fruit or vegetable?")

llm_chain.predict(human_input="What was one of the fruits I first asked you about?")
# 这里第二个问题的答案是来自于第一个答案本身的，因此我们使用到了 memory

# Helpers
import os
import json

from langchain.llms import OpenAI

# Agent imports
from langchain.agents import load_tools
from langchain.agents import initialize_agent

# Tool imports
from langchain.agents import Tool
from langchain.utilities import GoogleSearchAPIWrapper
from langchain.utilities import TextRequestsWrapper

os.environ["GOOGLE_CSE_ID"] = "YOUR_GOOGLE_CSE_ID"
os.environ["GOOGLE_API_KEY"] = "YOUR_GOOGLE_API_KEY"

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

search = GoogleSearchAPIWrapper()

requests = TextRequestsWrapper()

toolkit = [
    Tool(
        name = "Search",
        func=search.run,
        description="useful for when you need to search google to answer questions about current events"
    ),
    Tool(
        name = "Requests",
        func=requests.get,
        description="Useful for when you to make a request to a URL"
    ),
]

agent = initialize_agent(toolkit, llm, agent="zero-shot-react-description", verbose=True, return_intermediate_steps=True)

response = agent({"input":"What is the capital of canada?"})
response['output']

response = agent({"input":"Tell me what the comments are about on this webpage https://news.ycombinator.com/item?id=34425779"})
response['output']

LangChain 实战指南：9 个典型应用场景

LangChain 实战指南：9 个典型应用场景

9 个范例功能列表

一、文本总结 (Summarization)

1. 短文本总结

更多推荐文章

相关免费在线工具

2. 长文本总结

二、文档问答 (QA based Documents)

1. 短文本问答

2. 长文本问答

三、信息抽取 (Extraction)

1. 手动格式转换

2. 自动格式转换

四、结果评估 (Evaluation)

五、数据库问答 (Querying Tabular Data)

六、代码理解 (Code Understanding)

七、API 交互 (Interacting with APIs)

八、聊天机器人 (Chatbots)

九、智能体 (Agents)

更多推荐文章

相关免费在线工具

LangChain 实战指南：9 个典型应用场景

LangChain 实战指南：9 个典型应用场景

9 个范例功能列表

一、文本总结 (Summarization)

1. 短文本总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

2. 长文本总结

二、文档问答 (QA based Documents)

1. 短文本问答

2. 长文本问答

三、信息抽取 (Extraction)

1. 手动格式转换

2. 自动格式转换

四、结果评估 (Evaluation)

五、数据库问答 (Querying Tabular Data)

六、代码理解 (Code Understanding)

七、API 交互 (Interacting with APIs)

八、聊天机器人 (Chatbots)

九、智能体 (Agents)

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具