4bit/8bit 启动 Mixtral 8*7B 大语言模型

4bit/8bit 启动 Mixtral 8*7B 大语言模型

0. 背景

个人电脑配置实在难以以 float16 运行 Mixtral 8*7B 大语言模型,所以参数 4bit 或者 8bit 来启动。

实际测试结果,4bit 时推理速度明显变快了,8bit 时推理也非常慢。

使用的推理框架时 fastchat。

1. 修改代码

vi fastchat/model/model_adapter.py

修改前,

class MistralAdapter(BaseModelAdapter):
    """The model adapter for Mistral AI models"""

    def match(self, model_path: str):
        return "mistral" in model_path.lower() or "mixtral" in model_path.lower()

    def load_model(self, model_path: str, from_pretrained_kwargs: dict):
        model, tokenizer = super().load_model(model_path, from_pretrained_kwargs)
        model.config.eos_token_id = tokenizer.eos_token_id
        model.config.pad_token_id = tokenizer.pad_token_id
        return model, tokenizer

修改后,

class MistralAdapter(BaseModelAdapter):
    """The model adapter for Mistral AI models"""

    def match(self, model_path: str):
        return "mistral" in model_path.lower() or "mixtral" in model_path.lower()

    def load_model(self, model_path: str, from_pretrained_kwargs: dict):
        # model, tokenizer = super().load_model(model_path, from_pretrained_kwargs)
        tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
        if "mixtral" in model_path.lower():
            model = AutoModelForCausalLM.from_pretrained(
                model_path,
                low_cpu_mem_usage=True,
                trust_remote_code=True,
                # attn_implementation="flash_attention_2",
                # load_in_8bit=True,
                load_in_4bit=True,
                **from_pretrained_kwargs,
            )
        else:
            model = AutoModelForCausalLM.from_pretrained(
                model_path,
                low_cpu_mem_usage=True,
                trust_remote_code=True,
                **from_pretrained_kwargs,
            )
        model.config.eos_token_id = tokenizer.eos_token_id
        model.config.pad_token_id = tokenizer.pad_token_id
        return model, tokenizer

完结!

最近更新

  1. TCP协议是安全的吗?

    2024-01-18 16:00:04       16 阅读
  2. 阿里云服务器执行yum,一直下载docker-ce-stable失败

    2024-01-18 16:00:04       16 阅读
  3. 【Python教程】压缩PDF文件大小

    2024-01-18 16:00:04       15 阅读
  4. 通过文章id递归查询所有评论(xml)

    2024-01-18 16:00:04       18 阅读

热门阅读

  1. Pytorch

    Pytorch

    2024-01-18 16:00:04      29 阅读
  2. docker部署wiki.js

    2024-01-18 16:00:04       38 阅读
  3. tcpdump 用法

    2024-01-18 16:00:04       29 阅读
  4. C和指针课后答案

    2024-01-18 16:00:04       36 阅读
  5. 第13章 1 进程和线程

    2024-01-18 16:00:04       26 阅读