g4f姊妹篇：g4l gpt4local面世拉！

2024-04-12 20:00:02
开发
44

众所周知g4f很强，安装具体见：gpt4free带来了更好的chatgpt体验！-CSDN博客

可喜的是，现在它的姊妹篇，gpt4local也面世拉！g4l 是一个高级 Python 库，允许您使用 llama.cpp 绑定运行语言模型。它是 @gpt4free 的姊妹项目，也提供人工智能，g4l不需要联网调用openai的资源，也就是模型下载到本地后可以完全本地运行，不需要联网。

官网地址：GitHub - xtekky/gpt4local: Openai-style, fast & lightweight local language model inference w/ documents

安装

需要安装llama.cpp

pip3 install -U llama-cpp-python

克隆源代码

git clone https://github.com/gpt4free/gpt4local
# 也可以使用镜像库
git clone https://atomgit.com/skywalk/gpt4local

安装g4l

cd gpt4local
pip install -r requirements.txt
pyhton3 setup.py install

下载模型

从HuggingFace下载所需 GGUF 格式的模型。您可以在 TheBloke 的页面上找到各种量化 .gguf 模型。

当然也可以到镜像网站看看：HF-Mirror - Huggingface 镜像站

作者推荐的是：mistral-7b-instruct (v2) 和orca-mini-3b 。后者可以从gpt4all下载。

作者认为开源模型里面最好的是：千问Qwen1.5-72B-Chat，地址：https://huggingface.co/Qwen/Qwen1.5-72B-Chat-GGUF/tree/main

模型可以量化，例如 q2_0 、 q4_0 、 q5_0 和 q8_0 。较高的量化“位数”（4 位或更多）通常可以保留更高的质量，而较低的级别会进一步压缩模型，这可能导致质量的显着损失。标准量化级别为 q4_0 。

一般来说7B模型需要8G内存，13B模型需要16G内存。

使用

基本方法

from g4l.local import LocalEngine

engine = LocalEngine(
    gpu_layers = -1,  # use all GPU layers
    cores      = 0    # use all CPU cores
)

response = engine.chat.completions.create(
    model    = 'orca-mini-3b-gguf2-g4_0',
    messages = [{"role": "user", "content": "hi"}],
    stream   = True
)

for token in response:
    print(token.choices[0].delta.content)

注意：该 model模型参数必须与您放置的 .gguf 的模型的文件名匹配，而不带 .gguf 扩展名！

与文档交互

from g4l.local import LocalEngine, DocumentRetriever

engine = LocalEngine(
    gpu_layers = -1,  # use all GPU layers
    cores      = 0,   # use all CPU cores
    document_retriever = DocumentRetriever(
        files       = ['einstein-albert.pdf'], 
        embed_model = 'SmartComponents/bge-micro-v2', # https://huggingface.co/spaces/mteb/leaderboard
    )
)

response = engine.chat.completions.create(
    model    = 'mistral-7b-instruct',
    messages = [
        {
            "role": "user", "content": "how was einstein's work in the laboratory"
        }
    ],
    stream   = True
)

for token in response:
    print(token.choices[0].delta.content or "", end="", flush=True)

文档检索

from g4l.local import DocumentRetriever

engine = DocumentRetriever(
    files=['einstein-albert.txt'], 
    embed_model='SmartComponents/bge-micro-v2', # https://huggingface.co/spaces/mteb/leaderboard
    verbose=True,
)

retrieval_data = engine.retrieve('what inventions did he do')

for node_with_score in retrieval_data:
    node = node_with_score.node
    score = node_with_score.score
    text = node.text
    metadata = node.metadata
    page_label = metadata['page_label']
    file_name = metadata['file_name']
    
    print(f"Text: {text}")
    print(f"Score: {score}")
    print(f"Page Label: {page_label}")
    print(f"File Name: {file_name}")
    print("---")

原文地址:https://blog.csdn.net/skywalk8163/article/details/137656318 本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：https://www.suanlizi.com/kf/1778754946896891904.html 如若内容造成侵权/违法违规/事实不符，请联系《酸梨子》网邮箱：1419361763@qq.com进行投诉反馈，一经查实，立即删除！

阅读全部