大模型提示词进阶指南：System Prompts 与数据集分析

上个月，我有幸获得新加坡首届 GPT-4 提示工程（Prompt Engineering）大赛相关奖项，该比赛由新加坡政府科技署（GovTech）组织，汇聚了超过 400 位优秀的参与者。

提示工程（Prompt Engineering）是一门融合了艺术和科学的学科——这门学科不仅需要理解技术，还需要一定的创造力和战略思维。 以下是我在学习过程中学到的提示工程策略汇编，这些策略可以驱动任何大语言模型（LLM）精准执行需求，甚至超常发挥！

本系列文章包括以下内容，其中🔵指的是适合初学者的提示语技巧，而🔴指的是高级策略（本文的重点）：

🔵 使用 CO-STAR 框架构建提示语
🔵 使用分隔符（delimiters）将提示语分段
🔴 使用 LLM guardrails 创建 system prompts
🔴 仅使用 LLM（无需插件或代码）分析数据集

01 [🔴] 使用 LLM guardrails 创建 system prompts

在进入正题之前，需要注意的是本节只适用于具有 System Prompt 功能的 LLM，而不像基础篇和本文的其他章节那样适用于任何 LLM。最著名的 LLM 当然是 ChatGPT，因此在本节中我们将以 ChatGPT 作为示例。

1.1 围绕 System Prompt 的术语

首先，让我们来理清术语，特别是关于 ChatGPT 的三种术语的使用：这三种术语在 ChatGPT 几乎可以互换使用："System Prompts"、"System Messages"和"Custom Instructions"。这让很多人感到困惑，以至于 OpenAI 特意发布了一篇文章来解释这些术语。以下是其摘要：

"System Prompts"和"System Messages"是通过 Chat Completions API 以编程方式与 ChatGPT 进行交互时使用的术语。
另一方面，"Custom Instructions"是通过用户界面与 ChatGPT 交互时使用的术语。

不过总的来说，这三个术语指的是同一件事，所以不要被这些术语混淆了！后续部分，本文将使用"System Prompts"一词。现在，让我们进入正题！

1.2 什么是 System Prompts？

System Prompts 是一种额外的提示语（prompt），我们可以在其中提供有关 LLM 行为方式的 instructions。它被认为是额外的提示语，因为它不属于您给 LLM 的"正常"提示语（即 User Prompts）。

在聊天中，每当您给 LLM 发送新的提示语时，System Prompts 都会像过滤器一样，LLM 会在回答您的新提示语之前自动应用这些提示语。这意味着 System Prompts 在 LLM 做出回答时都会被考虑进去。

1.3 何时使用 System Prompt？

您心中可能会想到的第一个问题是：为什么我应该在 System Prompts 中提供 instruction，而不是在我向与 LLM 的新对话的第一个提示语中提供 instruction，然后再与 LLM 进行更多的对话呢？

答案是，因为 LLM 的对话记忆是有限的。在后一种情况下，随着对话的继续，LLM 很可能会"忘记"您在聊天中提供的第一条提示语，从而使这些 instruction（指令）过时。

另一方面，如果在 System Prompts 中提供了 instruction，那么这些 System Prompts 会与聊天中提供的每个新提示语一起发送。这可以确保 LLM 在聊天过程中继续接收这些 instruction，无论聊天过程变得多长。

总结：在整个聊天过程中，使用 System Prompts 提供您希望 LLM 在回答时记住的 instruction。

1.4 System Prompt 应包括哪些内容？

System Prompt 通常应包括以下类别的 instruction：

目标任务的定义（Task definition），这样 LLM 在整个对话过程中都会记住它必须做什么。
输出格式（Output format），这样 LLM 在整个对话过程中都会记住它应该如何做出回答。
防范措施（Guardrails），这样 LLM 在整个对话过程中都会记住它不应该如何做出回答。Guardrails 是 LLM governance 中的新兴领域，指的是 LLM 被允许操作的行为边界。

例如，System Prompt 可能是这样的：

You will answer questions using this text: [insert text].

You will respond with a JSON object in this format: {"Question": "Answer"}.

year of birth	marital status	income	number of children	days since last purchase	amount spent
1985	Married	Medium	2	10	50
1990	Single	High	0	30	200
...	...	...	...	...	...

> System Prompt: > I want you to act as a data scientist to analyze datasets. Do not make up information that is not in the dataset. For each analysis I ask for, provide me with the exact and definitive answer and do not provide me with code or instructions to do the analysis on other platforms. > Prompt: ## CONTEXT > I sell wine. I have a dataset of information on my customers: [year of birth, marital status, income, number of children, days since last purchase, amount spent]. > ############# ## OBJECTIVE > I want you use the dataset to cluster my customers into groups and then give me ideas on how to target my marketing efforts towards each group. Use this step-by-step process and do not use code: > > 1. CLUSTERS: Use the columns of the dataset to cluster the rows of the dataset, such that customers within the same cluster have similar column values while customers in different clusters have distinctly different column values. Ensure that each row only belongs to 1 cluster. > > For each cluster found, > > CLUSTER_INFORMATION: Describe the cluster in terms of the dataset columns. > CLUSTER_NAME: Interpret [CLUSTER_INFORMATION] to obtain a short name for the customer group in this cluster. > MARKETING_IDEAS: Generate ideas to market my product to this customer group. > RATIONALE: Explain why [MARKETING_IDEAS] is relevant and effective for this customer group. > > ############# ## STYLE > Business analytics report > ############# ## TONE > Professional, technical > ############# ## AUDIENCE > My business partners. Convince them that your marketing strategy is well thought-out and fully backed by data. > ############# ## RESPONSE: MARKDOWN REPORT > <For each cluster in [CLUSTERS]> > — Customer Group: [CLUSTER_NAME] > — Profile: [CLUSTER_INFORMATION] > — Marketing Ideas: [MARKETING_IDEAS] > — Rationale: [RATIONALE] > Give a table of the list of row numbers belonging to each cluster, in order to back up your analysis. Use these table headers: [[CLUSTER_NAME], List of Rows]. > ############# ## START ANALYSIS > If you understand, ask me for my dataset.

大模型提示词进阶指南：System Prompts 与数据集分析

01 [🔴] 使用 LLM guardrails 创建 system prompts

1.1 围绕 System Prompt 的术语

1.2 什么是 System Prompts？

1.3 何时使用 System Prompt？

1.4 System Prompt 应包括哪些内容？

更多推荐文章

相关免费在线工具

1.5 但是，"正常"的聊天提示语又是什么呢？

1.6 LLM guardrails 动态化

02 [🔴] 仅使用 LLM（无需插件或代码）分析数据集

2.1 大语言模型不擅长的数据集分析类型

2.2 大语言模型擅长的数据集分析类型

2.3 仅使用 LLM 分析 Kaggle 数据集

2.4 验证 LLM 的分析结果

2.4.1 Young Families

2.4.2 Discerning Enthusiasts

2.5 如果我们使用 ChatGPT 的高级数据分析插件会怎样呢？

2.6 那么…何时使用 LLM 分析数据集？

2.7 现在回到提示工程（prompt engineering）！

总结与建议

更多推荐文章

相关免费在线工具

Row	Year	Status	Income	Children	Days	Amount
3	1982	Married	Low	2	5	30
4	1985	Cohabiting	Medium	1	12	45
...	...	...	...	...	...	...

Row	Year	Status	Income	Children	Days	Amount
2	1975	Single	High	0	60	500
5	1990	Married	High	3	15	600
...	...	...	...	...	...	...

大模型提示词进阶指南：System Prompts 与数据集分析

01 [🔴] 使用 LLM guardrails 创建 system prompts

1.1 围绕 System Prompt 的术语

1.2 什么是 System Prompts？

1.3 何时使用 System Prompt？

1.4 System Prompt 应包括哪些内容？

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

1.5 但是，"正常"的聊天提示语又是什么呢？

1.6 LLM guardrails 动态化

02 [🔴] 仅使用 LLM（无需插件或代码）分析数据集

2.1 大语言模型不擅长的数据集分析类型

2.2 大语言模型擅长的数据集分析类型

2.3 仅使用 LLM 分析 Kaggle 数据集

2.4 验证 LLM 的分析结果

2.4.1 Young Families

2.4.2 Discerning Enthusiasts

2.5 如果我们使用 ChatGPT 的高级数据分析插件会怎样呢？

2.6 那么…何时使用 LLM 分析数据集？

2.7 现在回到提示工程（prompt engineering）！

总结与建议

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具