Llama - Prompting - 酸梨子-专注技术分享

本文翻译整理自：Prompting
https://llama.meta.com/docs/how-to-guides/prompting/

文章目录

链接到笔记本，显示本节讨论的技术示例。

提示工程是自然语言处理（NLP）中使用的一种技术，通过向他们提供更多关于手头任务的上下文和信息来提高语言模型的性能。它涉及创建提示，这是为模型提供额外信息或指导的短文本片段，例如它将生成的文本的主题或流派。通过使用提示，模型可以更好地理解预期什么样的输出，并产生更准确和相关的结果。在Llama 2中，上下文的大小，就标记的数量而言，从2048到4096翻了一番。

制作有效的提示

制作有效的提示是提示工程的重要组成部分。以下是创建提示的一些技巧，这些技巧将有助于提高语言模型的性能：

清晰简洁：您的提示应该易于理解，并为模型生成相关输出提供足够的信息。避免使用可能混淆模型的行话或技术术语。
使用特定示例：在提示中提供特定示例可以帮助模型更好地理解预期的输出类型。例如，如果您希望模型生成关于特定主题的故事，请包含一些关于设置、角色和情节的句子。
改变提示：使用不同的提示可以帮助模型更多地了解手头的任务，并产生更多样化和创造性的输出。尝试使用不同的样式、色调和格式来查看模型的响应方式。
测试和改进：一旦你创建了一组提示，在模型上测试它们，看看它的表现如何。如果结果与预期不符，请尝试通过添加更多细节或调整色调和样式来改进提示。
使用反馈：最后，使用来自用户或其他来源的反馈来不断改进您的提示。这可以帮助您确定模型需要更多指导的领域并做出相应的调整。

明确说明

详细、明确的指令比开放式提示产生更好的结果：您可以将明确的指令视为使用规则和限制来响应您的提示。

风格化

Explain this to me like a topic on a children's educational network show teaching elementary students.

I'm a software engineer using large language models for summarization. Summarize the following text in under 250 words:

Give your answer like an old timey private investigator hunting down a case step by step.

格式化

Use bullet points.

Return as a JSON object.

Use less technical terms and help me apply it in my work in communications.

限制

Only use academic papers.

Never give sources older than 2020.

If you don't know the answer, say that you don't know.

以下是一个通过将响应限制在最近创建的源来提供明确指示以提供更具体的结果的示例：

Explain the latest advances in large language models to me.
#  More likely to cite sources from 2017

Explain the latest advances in large language models to me. Always cite your sources.
Never cite sources older than 2020.
# Gives more specific advances and only cites sources from 2020

提示使用 Zero- and Few-Shot 学习

镜头是您期望从大型语言模型中获得哪种类型的提示和响应的示例或演示。这个术语起源于在照片上训练计算机视觉模型，其中一个镜头是模型用来对图像进行分类的一个示例或实例。

Zero-Shot Prompting

像Meta Llama这样的大型语言模型能够遵循指令并产生响应，而无需预先看到任务示例。没有示例的提示称为“零镜头提示”。

Text: This was the best movie I've ever seen! 
The sentiment of the text is:

Text: The director was trying too hard.
The sentiment of the text is:

Few-Shot Prompting

添加所需输出的具体示例通常会导致更准确、更一致的输出。这种技术称为“少镜头提示”。在本例中，生成的响应遵循我们所需的格式，该格式提供了一个更细致入微的情绪分类器，给出了积极、中性和消极的响应置信度百分比。

You are a sentiment classifier. For each message, give the percentage of positive/netural/negative. Here are some samples:
Text: I liked it
Sentiment: 70% positive 30% neutral 0% negative
Text: It could be better
Sentiment: 0% positive 50% neutral 50% negative
Text: It's fine

Sentiment: 25% positive 50% neutral 25% negative

Text: I thought it was okay

Text: I loved it!

Text: Terrible service 0/10

基于角色的提示

根据被处理的人或实体的角色或观点创建提示。这种技术对于从语言模型生成更相关和更吸引人的响应很有用。

优点：

提高相关性：基于角色的提示有助于语言模型理解被处理的人或实体的角色或观点，这可以导致更相关和更吸引人的响应。
提高准确性：提供关于被处理的人或实体的角色或观点的额外上下文可以帮助语言模型避免犯错误或误解。

缺点：

需要努力：需要更多的努力来收集和提供关于被处理的个人或实体的角色或观点的必要信息。

示例：

You are a virtual tour guide currently walking the tourists Eiffel Tower on a night tour. Describe Eiffel Tower to your audience that covers its history, number of people visiting each year, amount of time it takes to do a full tour and why do so many people visit this place each year.

思维链技术

包括向语言模型提供一系列提示或问题，以帮助指导其思维并产生更连贯和相关的反应。这种技术有助于从语言模型中产生更深思熟虑和推理合理的反应。

优点：

提高连贯性：帮助语言模型以逻辑和结构化的方式思考问题，从而产生更连贯和相关的响应。
增加深度：提供一系列提示或问题可以帮助语言模型更深入、更彻底地探索主题，从而可能导致更有洞察力和信息更丰富的响应。

缺点：

需要努力：思维链技术需要更多的努力来创建和提供必要的提示或问题。

示例：

You are a virtual tour guide from 1901. You have tourists visiting Eiffel Tower. Describe Eiffel Tower to your audience. Begin with
1. Why it was built
2. Then by how long it took them to build
3. Where were the materials sourced to build
4. Number of people it took to build
5. End it with the number of people visiting the Eiffel tour annually in the 1900's, the amount of time it completes a full tour and why so many people visit this place each year.
Make your tour funny by including 1 or 2 funny jokes at the end of the tour.

Self-Consistency

LLM是概率性的，因此即使使用思维链，单代也可能产生不正确的结果。自我一致性通过从多代中选择最频繁的答案来提高准确性（以更高的计算为代价）：

John found that the average of 15 numbers is 40.
If 10 is added to each number then the mean of the numbers is?
Report the answer surrounded by three backticks, for example: ```123```

多次运行上述操作并获取答案最常见的返回值将使用自洽方法。

检索-增强生成

常见的事实通常可以从当今开箱即用的大型模型中获得（即仅使用模型权重）。虽然例如：

What is the capital of  California?
# The capital of California is Sacramento...

What was the temperature in Menlo Park on December 12th, 2023?
# I'm just an AI, I don't have have access to real-time or historical weather data...

检索增强生成，或RAG，描述了在提示中包含从外部数据库检索到的信息的做法。这是将事实合并到LLM应用程序中的有效方法，并且比微调更实惠，微调也可能对基础模型的功能产生负面影响。

这可以像查找表一样简单，也可以像包含公司所有知识的矢量数据库一样复杂：

Given the following information about temperatures in Menlo Park:
2023-12-11 :  52 degrees Fahrenheit
2023-12-12 :  51 degrees Fahrenheit
2023-12-13 :  55 degrees Fahrenheit
What was the temperature in Menlo Park on 2023-12-12?

# Sure! The temperature in Menlo Park on 2023-12-12 was 51 degrees Fahrenheit.

What was the temperature in Menlo Park on 2023-07-18 ?

# Sorry, I don't have information about the temperature in Menlo Park on 2023-07-18. The information provided only includes temperatures for December 11th, 12th, and 13th of 2023.

程序辅助语言模型

LLM天生不擅长执行计算。虽然LLM不擅长算术，但它们非常适合代码生成。程序辅助语言通过指示LLM编写代码来解决计算任务来利用这一事实。

Python code to calculate: ((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))

Here is the Python code to calculate the given expression:

((-5 + 93 * 4 - 0) * (4**4 + -7 + 0 * 5))

# Calculate the expression inside the parentheses
a = -5 + 93 * 4 - 0
b = 4**4 + -7 + 0 * 5

# Multiply a and b
result = a * b

# Print the result
print(result)

直接使用代码可以提供正确的结果。

限制无关token

一个常见的挑战是生成没有无关令牌的响应（例如“当然！这里有更多关于…的信息”）。

通过结合角色、规则和限制、明确的指令和示例，可以提示模型生成所需的响应。

You are a robot that only outputs JSON.
You reply in JSON format with the field 'zip_code'.
Example question: What is the zip code of the Empire State Building? 
Example answer: {'zip_code': 10118}
Now here is my question: What is the zip code of Menlo Park?

# "{'zip_code': 94025}"