大模型_alpaca-lora微调及推理部署

2024-04-27 22:46:03
开发
30

文章目录

lora是什么
- 重要相关参数
- LoRA 的优势
微调部署
推理部署

lora是什么

在这里插入图片描述

重要相关参数

lora_rank(int,optional): LoRA 微调中的秩大小。
lora_alpha(float,optional): LoRA 微调中的缩放系数。
lora_dropout(float,optional): LoRA 微调中的 Dropout 系数。

LoRA 的优势

LoRA 的最大优势是速度更快，使用的内存更少，因此可以在消费级硬件上运行。
在多卡训练时，Lora也是效率很高的，在多卡训练中，LoRA的速度优势主要体现在两个方面：

计算效率：由于LoRA只需要计算和优化注入的低秩矩阵，因此它的计算效率比完全微调更高。在多卡训练中，LoRA可以将注入矩阵的计算和优化分配到多个GPU上，从而加速训练过程。
通信效率：在多卡训练中，通信效率通常是一个瓶颈。由于LoRA只需要通信注入矩阵的参数，因此它的通信效率比完全微调更高。在多卡训练中，LoRA可以将注入矩阵的参数分配到多个GPU上，从而减少通信量和通信时间。因此，LoRA在多卡训练中通常比完全微调更快。具体来说，LoRA可以将硬件门槛降低多达3倍，从而提高训练的效率。

微调部署

下载项目

git clone https://github.com/tloen/alpaca-lora.git

在这里插入图片描述

切换到项目目录下

cd alpaca-lora

切换conda环境

source activate
conda activate alpaca-lora

模型下载

https://huggingface.co/decapoda-research/llama-7b-hf

模型放在：/data/sim_chatgpt/llama-7b-hf

微调数据集下载

该数据基于斯坦福alpca数据进行了清洗，但至于具体清洗流程并不知

https://huggingface.co/datasets/yahma/alpaca-cleaned

微调数据放在：/data/datasets/alpaca-cleaned

启动微调

nohup python -u finetune.py \
    --base_model '/data/sim_chatgpt/llama-7b-hf' \
    --data_path '/data/datasets/alpaca-cleaned' \
    --output_dir './lora-alpaca' \
    >> log.out 2>&1 &

失败1

原因

Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model.

查看log.out日志，发现是GPU不够
在这里插入图片描述

分析

nvidia-smi，查看内存使用情况
在这里插入图片描述

失败2

RuntimeError：expected scaler type Half but found Float

修改前

在这里插入图片描述

修改后

在finetune.py文件上，加上"with torch.autocast(“cuda”):"，并注意下一行缩进问题

with torch.autocast("cuda"):
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)

在这里插入图片描述
再次启动微调即可
注：微调时间很长，需要等待，具体微调日志可见log.out文件

推理部署

在generate.py文件，将share=True，便于公网访问。

python generate.py \
    --load_8bit \
    --base_model '/data/sim_chatgpt/llama-7b-hf' \
    --lora_weights './lora-alpaca/checkpoint-2000'

注意：/lora-alpaca文件有，比如checkpoint-800、checkpoint-1000、checkpoint-2000，可自由选择

如果报错，不能创建链接，降低下gradio版本即可，如：pip install gradio==3.13
在这里插入图片描述
一两分钟后看到公网网址

将公网网址放到浏览器上提问：
根据"https://huggingface.co/datasets/yahma/alpaca-cleaned"，instruction（string）里的一个问题"Give three tips for staying healthy."进行提问，发现网页输出的结果跟output差不多，说明模型进行微调学习得不错。

学习的参考资料：
Instruct-tune LLaMA on consumer hardware
alpaca-lora微调
 LLM模型微调方法及经验总结

原文地址:https://blog.csdn.net/weixin_42504788/article/details/138254278 本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：https://www.suanlizi.com/kf/1784232545681018880.html 如若内容造成侵权/违法违规/事实不符，请联系《酸梨子》网邮箱：1419361763@qq.com进行投诉反馈，一经查实，立即删除！

阅读全部