LLM之RAG实战(二):使用LlamaIndex + Metaphor实现知识工作自动化

      最先进的大型语言模型(LLM),如ChatGPT、GPT-4、Claude 2,具有令人难以置信的推理能力,可以解锁各种用例——从洞察力提取到问答,再到通用工作流自动化。然而,他们检索上下文相关信息的能力有限。检索增强生成(RAG)系统可以将LLM与静态知识源上的外部存储解决方案相结合。


  1. 通用抽象,允许LLM以“读取”和“写入”的方式智能地对数据执行各种任务;

  2. 一个适合LLM使用的好搜索引擎


       数据代理可以访问LlamaHub上提供的一套丰富的工具,从Gmail API到SQL数据库API,再到Bing搜索形式的基本工具。我们已经证明,他们能够执行e2e任务,从发送电子邮件、安排会议到自动化定制支持洞察力提取。然而,从来没有专门为LLM使用而设计的工具。



      Metaphor API旨在将你的LLM连接到互联网,它允许你在互联网上进行完全神经化、高度语义化的搜索,还可以从结果中获得干净的HTML内容。


Found an amazing article I read about the history of Rome’s architecture: [LINK]

       通过训练一个模型来预测人们谈论这些链接的方式,最终的结果是一种完全不同的互联网搜索方式——就像你要分享你想要的链接一样进行搜索。虽然一开始有点不直观,但以这种方式搜索可以返回极高质量的结果。但就LlamaIndex而言,您不必担心这一点,因为默认情况下,查询将转换为Metaphor Prompt。


  • 您可以完全从语义上进行搜索,例如使用感觉或复杂的描述符;
  • 您只能搜索所需实体的类型。公司、文章、人;
  • 你可能会发现谷歌的内容表现不佳,可能是因为关键词不是正确的工具,也可能只是因为谷歌不在乎为这类内容返回好的结果。

PS:要了解更多信息,您可以阅读完整的Metaphor API博客文章(https://platform.metaphor.systems/blog/building-search-for-the-post-chatgpt-world)


LlamaHub提供了Metaphor API接口,包括如下5个工具可供Agent使用。

  • 搜索:是Metaphor的入口——Agent可以通过自然语言向Metaphor搜索引擎进行查询。查询还可以包含一些附加参数,例如返回结果的数量、要包含/排除的领域以及日期筛选器;
  • 检索文档:根据搜索到的文档内容从中检索出符合条件的部分内容;
  • 搜索和检索文档:结合了“搜索”和“检索文档”的功能;
  • 查找相似:直接调用Metaphor提供的端点,可以返回与给定URL相似的文档列表;
  • 当前日期:这是一个返回当前日期的函数。就其本身而言,它与Metaphor的API无关,但可能会事先调用它,以确定传递到Metaphor的某些端点的正确日期过滤器。




3.1 Metaphor工具测试


# Set up Metaphor toolfrom llama_hub.tools.metaphor.base import MetaphorToolSpecmetaphor_tool = MetaphorToolSpec(api_key='your-key',)# convert tool spec to a list of toolsmetaphor_tool_list = metaphor_tool.to_tool_list()for tool in metaphor_tool_list:print(tool.metadata.name)


metaphor_tool.search('machine learning transformers', num_results=3)


[{'title': 'On the potential of Transformers in Reinforcement Learning','url': 'https://lorenzopieri.com/rl_transformers/','id': 'ysJlYSgeGW3l4zyOBoSGcg'},{'title': 'Transformers: Attention in Disguise','url': 'https://www.mihaileric.com/posts/transformers-attention-in-disguise/','id': 'iEYMai5rS9k0hN5_BH0VZg'},{'title': 'Transformers in Computer Vision: Farewell Convolutions!','url': 'https://towardsdatascience.com/transformers-in-computer-vision-farewell-convolutions-f083da6ef8ab?gi=a1d0a9a2896c','id': 'kX1Z89DdjSvBrH1S1XLvwg'}]

3.2 使用Metaphor设置OpenAI Agent


from llama_index.agent import OpenAIAgent# We don't give the Agent our unwrapped retrieve document tools, instead passing the wrapped toolsagent = OpenAIAgent.from_tools(  metaphor_tool_list,  verbose=True,)


print(agent.chat('What are the best restaurants in toronto?"))


=== Calling Function ===Calling function: search with args: {
    "query": "best restaurants in Toronto"}[Metaphor Tool] Autoprompt string: Here's a link to the best restaurant in Toronto:Got output: [{'title': 'Via Allegro Ristorante - Toronto Fine Dining Restaurant', 'url': 'https://viaallegroristorante.com/', 'id': 'EVlexzJh-lzkVr4tb2y_qw'}, {'title': 'The Senator – Home', 'url': 'https://thesenator.com/', 'id': 'dA3HVr5P8E0Bs7nH2gH7ZQ'}, {'title': 'Home - The Rushton', 'url': 'https://therushton.com/', 'id': '6Je-igG-i-ApqISC5XXmGQ'}, {'title': 'Location', 'url': 'https://osteriagiulia.ca/', 'id': 'HjP5c54vqb3n3UNa3HevSA'}, {'title': 'StockYards | Stockyards Toronto', 'url': 'https://www.thestockyards.ca/', 'id': 'Pffz-DQlOepqVgKQDmW5Ig'}, {'title': 'Select A Restaurant', 'url': 'https://www.torontopho.com/', 'id': 'DiQ1hU1gmrIzpKnOaVvZmw'}, {'title': 'Home | Kit Kat Italian Bar & Grill', 'url': 'http://www.kitkattoronto.com/', 'id': 'kdAcLioBgnwzuHyd0rWS1w'}, {'title': 'La Fenice', 'url': 'https://www.lafenice.ca/', 'id': 'M-LHQZP6V40V81fqLFAQxQ'}, {'title': 'Le Phénix', 'url': 'https://www.lephenixto.com/', 'id': 'spCTcFr0GHlFUTzyngfRVw'}, {'title': 'ITALIAN, INSPIRED.', 'url': 'https://figotoronto.com/', 'id': 'OvBcTqEo1tCSywr4ATptCg'}]========================Here are some of the best restaurants in Toronto:1. [Via Allegro Ristorante](https://viaallegroristorante.com/)2. [The Senator](https://thesenator.com/)3. [The Rushton](https://therushton.com/)4. [Osteria Giulia](https://osteriagiulia.ca/)5. [Stockyards](https://www.thestockyards.ca/)6. [Toronto Pho](https://www.torontopho.com/)7. [Kit Kat Italian Bar & Grill](http://www.kitkattoronto.com/)8. [La Fenice](https://www.lafenice.ca/)9. [Le Phénix](https://www.lephenixto.com/)10. [Figo](https://figotoronto.com/)You can visit their websites for more information. Enjoy your dining experience in Toronto!



print(agent.chat('tell me more about Osteria Giulia'))
=== Calling Function ===Calling function: retrieve_documents with args: {
  "ids": ["HjP5c54vqb3n3UNa3HevSA"]}Got output: […]========================Osteria Giulia is a restaurant located at 134 Avenue Road in Toronto, Ontario. You can contact them at 416.964.8686 or via email at info@osteriagiulia.ca (for general inquiries only, no reservation requests via email).The restaurant's operating hours are from Monday to Saturday, from 5:00pm to 11:00pm. On Sundays, the restaurant is available for private bookings.Parking is available on Avenue Road and Davenport Road.You can follow Osteria Giulia on Instagram [@osteriagiulia](https://www.instagram.com/osteriagiulia). They also have a sister restaurant called Giulietta, which you can visit at [giu.ca](https://giu.ca) or on Instagram [@giulietta972](https://www.instagram.com/giulietta972).Please note that the information provided is based on the available document and may be subject to change. It is recommended to visit their official website or contact them directly for the most up-to-date information.

3.3 避免上下文窗口问题(高级)





from llama_index.tools.tool_spec.load_and_search.base import LoadAndSearchToolSpec# The search_and_retrieve_documents tool is the third in the tool list, as seen abovewrapped_retrieve = LoadAndSearchToolSpec.from_defaults(  metaphor_tool_list[2],)


# Just pass the wrapped tools and the get_date utilityagent = OpenAIAgent.from_tools(  [*wrapped_retrieve.to_tool_list(), metaphor_tool_list[4]],  verbose=True,)print(agent.chat('Can you summarize everything published in the last month regarding news on superconductors'))


=== Calling Function ===Calling function: current_date with args: {}Got output: 2023-08-20=========================== Calling Function ===Calling function: search_and_retrieve_documents with args: {
    "query": "superconductors",  "start_published_date": "2023-07-20",  "end_published_date": "2023-08-20"}[Metaphor Tool] Autoprompt: "Here is an interesting article about superconductors:Got output: Content loaded! You can now search the information using read_search_and_retrieve_documents=========================== Calling Function ===Calling function: read_search_and_retrieve_documents with args: {
    "query": "superconductors"}Got output: Superconductors are materials that can perfectly conduct electricity. They are used in a variety of applications, such as particle accelerators, nuclear fusion devices, MRI machines, and maglev trains. However, so far, no superconductor has been proven to work at ambient pressures and temperatures. On July 22, scientists in South Korea published research claiming to have solved this problem with a material called LK-99, which has an electrical resistivity that drops to near zero at 30 degrees Celsius (86 degrees Fahrenheit).========================In the last month, there have been developments in the field of superconductors. Scientists in South Korea have published research on a material called LK-99, which has the ability to conduct electricity with near-zero resistance at a temperature of 30 degrees Celsius (86 degrees Fahrenheit). This breakthrough could potentially lead to the development of superconductors that work at ambient pressures and temperatures, opening up new possibilities for various applications such as particle accelerators, nuclear fusion devices, MRI machines, and maglev trains.



[1] https://blog.llamaindex.ai/llamaindex-metaphor-towards-automating-knowledge-work-with-llms-5520a32efa2f


  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2023-12-12 02:00:02       98 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2023-12-12 02:00:02       106 阅读
  3. 在Django里面运行非项目文件

    2023-12-12 02:00:02       87 阅读
  4. Python语言-面向对象

    2023-12-12 02:00:02       96 阅读


  1. leetcode203. 移除链表元素

    2023-12-12 02:00:02       59 阅读
  2. PHP变量用{}的使用方法

    2023-12-12 02:00:02       63 阅读
  3. P5707 【深基2.例12】上学迟到题解

    2023-12-12 02:00:02       62 阅读
  4. 使用嵌入式高速计数器的示例:菱FX5U系列PLC

    2023-12-12 02:00:02       70 阅读
  5. Linux的ps简单实现

    2023-12-12 02:00:02       57 阅读
  6. CUDA:基于模板项目的示例应用程序编程

    2023-12-12 02:00:02       54 阅读
  7. Codeforces Round 900 (Div. 3)补题

    2023-12-12 02:00:02       71 阅读
  8. Socket.D 网络应用协议,v2.1.6 发布

    2023-12-12 02:00:02       63 阅读
  9. tmux常见会话管理命令

    2023-12-12 02:00:02       54 阅读
  10. useMemo和useCallback

    2023-12-12 02:00:02       59 阅读
  11. Vue 3实现的移动端两指控制图片缩放功能

    2023-12-12 02:00:02       71 阅读