在本地CPU上运行大模型：深入解析大模型的运行机制！

大模型，尤其是近年来流行的GPT、Llama等大语言模型，已经在各种应用中取得了令人瞩目的效果。然而，随之而来的是模型大小的飞速增长，导致普通开发者和研究者难以在本地环境中运行这些模型。对于许多初学者和非专业人士来说，这无疑是一个巨大的挑战。那么，如何在本地CPU上运行这些大模型，而不是依赖昂贵的GPU或云计算资源呢？

为了在本地CPU上运行，我们可以构建一个mini-LLM（轻量级语言模型）。这种模型将模仿大模型的基本结构，但参数数量会大大减少。通过减少模型的深度或宽度，我们可以得到一个在本地环境中运行得相对轻量的模型。

以Baichuan2（https://github.com/baichuan-inc/Baichuan2）为例，可修改为以下配置：

{
    "architectures": [
        "BaichuanForCausalLM"
    ],
    "auto_map": {
        "AutoConfig": "configuration_baichuan.BaichuanConfig",
        "AutoModelForCausalLM": "modeling_baichuan.BaichuanForCausalLM"
    },
    "tokenizer_class": "BaichuanTokenizer",
    "bos_token_id": 1,
    "eos_token_id": 2,
    "hidden_act": "silu",
    "hidden_size": 8,
    "initializer_range": 0.02,
    "intermediate_size": 16,
    "max_position_embeddings": 4096,
    "model_max_length": 4096,
    "model_type": "baichuan",
    "num_attention_heads": 4,
    "num_hidden_layers": 1,
    "pad_token_id": 0,
    "rms_norm_eps": 1e-06,
    "_from_model_config": true,
    "tie_word_embeddings": false,
    "torch_dtype": "bfloat16",
    "transformers_version": "4.29.2",
    "use_cache": true,
    "vocab_size": 125696
}

这里将num_hidden_layers改为了1，num_attention_heads改为了4，intermediate_size改为了16，hidden_size改为了8。接下来需要修改代码中和cuda相关代码，修改后就能愉快地在本地运行代码了。

完整的代码放在了这里:

一	二	三	四	五	六	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

在本地CPU上运行大模型：深入解析大模型的运行机制！

专题展示

文章目录

关注公众号，获取最新动态

在本地CPU上运行大模型：深入解析大模型的运行机制！

猜你喜欢

专题展示

文章目录

关注公众号，获取最新动态