You will answer questions using this text: [insert text].
You will respond with a JSON object in this format: {"Question": "Answer"}.
If the text does not contain sufficient information to answer the question, do not make up information and give the answer as "NA".
You are only allowed to answer questions related to [insert scope]. Never answer any questions related to demographic information such as age, gender, and religion.
各部分内容涉及的类别如下:
Task Definition: You will answer questions using this text...
Output Format: You will respond with a JSON object...
Guardrails: If the text does not contain... / You are only allowed...
1.5 但是,"正常"的聊天提示语又是什么呢?
现在你可能会想:听起来 System Prompt 中已经提供了很多信息。那我应该在聊天的"正常"提示语(即 User Prompts)中写些什么呢?
System Prompt 概述了当前的一般任务。在上面的 System Prompt 示例中,任务已被定义为只使用一段特定的文本来回答问题,并且 LLM 被指示以{"Question": "Answer"}的格式进行回答。
在这种情况下,聊天过程中的每个 User Prompt 都将简化为你希望 LLM 用文本回答的问题。例如,某个用户的提问可能是'这段文本是关于什么的?'然后 LLM 会回答说{"这段文本是关于什么的?": "这段文本是关于……"}。
但是,让我们进一步概括这个任务示例。在这种情况下,我们可以将上述 System Prompt 的第一行从:
You will answer questions using this text: [insert text].
编辑为:
You will answer questions using the provided text.
现在,每个用户在聊天时的提示语将包括进行问题回答的文本和要回答的问题,例如:
[insert text]
[insert question]
在这里,还将使用 XML 标签作为分隔符,以便以结构化的方式向 LLM 提供所需的两个信息片段。XML 标签中使用的名词"text"和"question"与 System Prompt 中使用的名词相对应,这样 LLM 就能理解标签与 System Prompt instructions 之间的关系。
总之,System Prompt 应给出总体任务 instructions,而每个 User Prompt 应提供任务执行的具体细节。例如,在本例中,这些具体的细节是 text 和 question。
1.6 LLM guardrails 动态化
上面通过 System Prompt 中的几句话添加了 guardrails。这些 guardrails 会被固定下来,在整个聊天过程中都不会改变。但是如果您希望在对话的不同阶段设置不同的 guardrails,该怎么办?
对于使用 ChatGPT Web 界面的用户来说,目前还没有直接的方法来做到这一点。不过,如果您正在通过编程方式与 ChatGPT 进行交互,那你就走运了!随着人们对构建有效的 LLM guardrail 的关注度越来越高,一些开源软件包也应运而生,它们可以让你以编程方式设置更详细、更动态的 guardrail。
> System Prompt:> I want you to act as a data scientist to analyze datasets. Do not make up information that is not in the dataset. For each analysis I ask for, provide me with the exact and definitive answer and do not provide me with code or instructions to do the analysis on other platforms.> Prompt:## CONTEXT> I sell wine. I have a dataset of information on my customers: [year of birth, marital status, income, number of children, days since last purchase, amount spent].> ############### OBJECTIVE> I want you use the dataset to cluster my customers into groups and then give me ideas on how to target my marketing efforts towards each group. Use this step-by-step process and do not use code:>
> 1. CLUSTERS: Use the columns of the dataset to cluster the rows of the dataset, such that customers within the same cluster have similar column values while customers in different clusters have distinctly different column values. Ensure that each row only belongs to 1 cluster.>
> For each cluster found,>
> CLUSTER_INFORMATION: Describe the cluster in terms of the dataset columns.
> CLUSTER_NAME: Interpret [CLUSTER_INFORMATION] to obtain a short name for the customer group in this cluster.
> MARKETING_IDEAS: Generate ideas to market my product to this customer group.> RATIONALE: Explain why [MARKETING_IDEAS] is relevant and effective for this customer group.
>
> #############
## STYLE
> Business analytics report
> #############
## TONE
> Professional, technical
> #############
## AUDIENCE
> My business partners. Convince them that your marketing strategy is well thought-out and fully backed by data.
> #############
## RESPONSE: MARKDOWN REPORT
> <Foreachclusterin [CLUSTERS]>
> — Customer Group: [CLUSTER_NAME]> — Profile: [CLUSTER_INFORMATION]
> — Marketing Ideas: [MARKETING_IDEAS]> — Rationale: [RATIONALE]> Give a table of the list of row numbers belonging to each cluster, in order to back up your analysis. Use these table headers: [[CLUSTER_NAME], List of Rows].
> #############
## START ANALYSIS
> If you understand, ask me for my dataset.
GPT-4 的回答如下,接下来我们将数据集以 CSV 字符串的形式传递给它。
随后,GPT-4 将按照我们要求的 markdown 格式回复分析结果。
2.4 验证 LLM 的分析结果
为了简洁起见,我们将挑选 LLM 生成的 2 个客户群体进行验证,比如 Young Families 和 Discerning Enthusiasts。
2.4.1 Young Families
LLM 总结的该人群特征:1980 年后出生,已婚或同居,收入中等偏低,有孩子,经常进行小额购买。
LLM 将数据集中的这些行聚类到了 Young Families 这个群体中:3、4、7、10、16、20