在 Google Colab 中运行 LLaMA-13B 模型和 LangChain | 极客日志

PythonAI算法

在 Google Colab 中运行 LLaMA-13B 模型和 LangChain

综述由AI生成演示了如何在免费 Google Colab 环境中部署 LLaMA-13B 大语言模型。通过 llama.cpp 库加载量化模型，结合 LangChain 框架实现了 LLM 链、自动路由、聊天对话、记忆管理及代理功能。文章提供了完整的代码示例，展示了如何利用开源组件构建本地化 AI 应用，并分析了不同模型规模下的资源消耗及潜在安全风险。

不羁发布于 2026/4/6更新于 2026/5/2124 浏览

在 Google Colab 中运行 LLaMA-13B 模型和 LangChain

本文演示了如何在免费 Google Colab 实例上运行 LLaMA 2 13B 模型，并测试 LangChain 的多种功能，包括创建基于聊天的应用程序和使用代理。所有使用的组件均基于开源项目，完全免费。

LLaMA.cpp

LLaMA.CPP 是一个非常有趣的开源项目，最初是为了在 Macbooks 上运行 LLaMA 模型而设计的，但其功能已经远远超出了这个范围。首先，它使用纯 C/C++ 编写，没有外部依赖，可以在任何硬件上运行（支持 CUDA、OpenCL 和 Apple 硅；甚至可以在树莓派上运行）。其次，LLaMA.CPP 可以与 LangChain 连接，这允许我们免费测试其许多功能，而无需 OpenAI 密钥。最后但同样重要的是，由于 LLaMA.CPP 可以在任何地方运行，它是一个很好的候选者，可以在免费的 Google Colab 实例上运行。Google 提供了免费访问具有 12 GB RAM 和 16 GB VRAM 的 Python 笔记本。

在使用 LLaMA 之前，让我们安装库。安装本身很简单；我们只需在使用 pip 之前启用 LLAMA_CUBLAS：

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install llama-cpp-python !pip3 install huggingface-hub !pip3 install sentence-transformers langchain langchain-experimental !huggingface-cli download TheBloke/Llama-2-7b-Chat-GGUF llama-2-7b-chat.Q4_K_M.gguf --local-dir/content --local-dir-use-symlinks False

对于第一次测试，我将使用 7B 模型。在这里，我还安装了 huggingface-hub 库，它允许我们自动下载 LLaMA.CPP 所需的 GGUF 格式的'Llama-2–7b-Chat'模型。我还安装了 LangChain 库，它将被用于进一步的测试。

现在，让我们加载模型并测试它是否正常工作：

from langchain.llms import LlamaCpp 
from langchain.callbacks.manager import CallbackManager 
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler 

n_gpu_layers = 40 
n_batch = 512 
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) 
llm = LlamaCpp( 
    model_path="/content/llama-2-7b-chat.Q4_K_M.gguf", 
    temperature=0.1, 
    n_gpu_layers=n_gpu_layers, 
    n_batch=n_batch, 
    callback_manager=callback_manager, 
    verbose=True,
)

当模型加载后，我们只需一行代码就可以测试它：

llm("What is the distance to the Moon? Write the short answer.")

在这里，我还使用了 StreamingStdOutCallbackHandler，它允许我们在'ChatGPT'风格中获得平滑的'流式'输出。

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

!huggingface-cli download TheBloke/Llama-2-13B-chat-GGUF llama-2-13b-chat.Q4_K_M.gguf --local-dir/content --local-dir-use-symlinks False

from langchain.prompts import PromptTemplate 
from langchain.schema.output_parser import StrOutputParser 
from langchain.callbacks.tracers import ConsoleCallbackHandler 

template = """<s>[INST] <<SYS>> Provide a correct and short answer to the question. <</SYS>> {question} [/INST]""" 
prompt = PromptTemplate(template=template, input_variables=["question"]) 
chain = prompt | llm | StrOutputParser() 
chain.invoke({"question":"What is the distance to the Moon?"}, config={# "callbacks": [ConsoleCallbackHandler()]})

template2 = """<s>[INST] <<SYS>> Use the summary {summary} and give 2 one sentence examples of practical applications of the subject [/INST] <</SYS>> [/INST] """ 
prompt2 = PromptTemplate( 
    input_variables=["summary"], 
    template=template2,
) 
chain2 = {"summary": prompt | llm | StrOutputParser()}| prompt2 | llm | StrOutputParser() 
chain2.invoke({"question":"What is the distance to the Moon?"}, config={# "callbacks": [ConsoleCallbackHandler()]})

from langchain.embeddings import HuggingFaceEmbeddings 
from langchain.utils.math import cosine_similarity 
from langchain.schema.runnable import RunnableLambda, RunnablePassthrough 

space_template = """<s>[INST] <<SYS>> You are an astronaut. You are great at answering questions about space. Provide a short answer to the question, understandable to a small kid. <</SYS>> {query} [/INST]""" 
math_template = """<s>[INST] <<SYS>> You are a mathematician. You are great at answering math questions. Provide a short answer to the question. <</SYS>> {query} [/INST]""" 
embeddings = HuggingFaceEmbeddings() 
prompt_templates = [space_template, math_template] 
prompt_embeddings = embeddings.embed_documents(prompt_templates)

def prompt_router(input):
    """ Find a proper template for the input """ 
    query_embedding = embeddings.embed_query(input["query"]) 
    similarity = cosine_similarity([query_embedding], prompt_embeddings)[0] 
    most_similar = prompt_templates[similarity.argmax()] 
    print("Using MATH" if most_similar == math_template else "Using SPACE") 
    return PromptTemplate.from_template(most_similar)

chain = ({"query": RunnablePassthrough()}| RunnableLambda(prompt_router)| llm | StrOutputParser())

from langchain.chains import LLMChain 
from langchain.prompts.chat import(
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain.schema import AIMessage, HumanMessage 
from langchain_experimental.chat_models import Llama2Chat 

sys_template = """<s>[INST] <<SYS>> Act as an experienced AI assistant. Write only one sentence answers. <</SYS>> [/INST] """ 
chat_prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(sys_template),
    HumanMessage(content="Hello, how are you doing?"),
    AIMessage(content="I'm doing well, thanks!"),
    HumanMessage(content="May I ask you a question about Moon?"),
    AIMessage(content="Yes, sure."),
    HumanMessagePromptTemplate.from_template("{question}"),
]) 
model = Llama2Chat(llm=llm) 
chain = chat_prompt | model | StrOutputParser() 
chain.invoke({"question":"How big is it?"}, config={# "callbacks": [ConsoleCallbackHandler()]})

from langchain.chains import ConversationChain 
from langchain.memory import ConversationBufferMemory, ConversationSummaryMemory, CombinedMemory, ChatMessageHistory 

conv_memory = ConversationBufferMemory(memory_key="chat_history_lines", input_key="input") 
summary_memory = ConversationSummaryMemory(llm=llm, input_key="input") 
memory = CombinedMemory(memories=[conv_memory, summary_memory]) 

template = """<s>[INST] <<SYS>> Act as an experienced AI assistant. Write one-sentence answers only. <</SYS>> Summary of conversation: {history} Current conversation: {chat_history_lines} Human: {input} [/INST] """ 

summary_memory.save_context({"input":"Hi, how are you"},{"output":"Thanks, I am fine"}) 
summary_memory.save_context({"input":"May I ask you questions about Moon?"},{"output":"Yes, sure"}) 
summary_memory.load_memory_variables({}) 

prompt = PromptTemplate( 
    input_variables=["history","input","chat_history_lines"], 
    template=template,
) 
conversation = ConversationChain(llm=llm, verbose=True, memory=memory, prompt=prompt) 
conversation.run("How far is it?")
conversation.run("And what about Mars?")

from langchain_experimental.tools import PythonREPLTool 
tool = PythonREPLTool() 
tool.run('import math; print(math.sqrt(5))')

from langchain_experimental.agents.agent_toolkits import create_python_agent 
from langchain.agents.agent_types import AgentType 

agent = create_python_agent(llm=llm, tool=tool, verbose=True, agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION) 
agent.agent.llm_chain.verbose = True 
agent.run("What is a square root of 5?")

在 Google Colab 中运行 LLaMA-13B 模型和 LangChain

在 Google Colab 中运行 LLaMA-13B 模型和 LangChain

LLaMA.cpp

更多推荐文章

相关免费在线工具

LangChain

1. LLM 链

2. 合并链

3. 自动路由

4. 基本聊天

5. 带记忆和消息摘要的聊天

6. 代理

结论

更多推荐文章

相关免费在线工具

在 Google Colab 中运行 LLaMA-13B 模型和 LangChain

在 Google Colab 中运行 LLaMA-13B 模型和 LangChain

LLaMA.cpp

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

LangChain

1. LLM 链

2. 合并链

3. 自动路由

4. 基本聊天

5. 带记忆和消息摘要的聊天

6. 代理

结论

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具