The Allen Institute of Artificial Intelligence (AI2) recently released a new large-scale language model OLMoE, a sparse hybrid expert (MoE) model with 7 billion parameters, but each input mark uses only 1 billion parameters, which significantly reduces the number of Inference costs and memory requirements. OLMoE has two versions: the universal version OLMoE-1B-7B and the instruction fine-tuning version OLMoE-1B-7B-Instruct, which performs no less than other large models in benchmarks, and even surpasses some larger models. . AI2 emphasizes the complete open source of OLMoE, which is particularly precious in the field of MoE models and provides valuable resources for academic research.
OLMoE adopts a sparse hybrid expert (MoE) architecture with 7 billion parameters, but only 1 billion parameters are used per input tag. It comes in two versions, the more general OLMoE-1B-7B and the instruction-tuned OLMoE-1B-7B-Instruct.
Unlike most other hybrid expert models that are closed source, AI2 emphasizes that OLMoE is completely open source. They mentioned in the paper that "most MoE models are closed-source: although some disclose model weights, there is extremely limited information on their training data, code, or recipes." This makes it impossible for many academic researchers to access these models .
AI2 research scientist Nathan Lambert said on social media that OLMoE will help with policy making, which could provide a starting point for the launch of the H100 cluster in academia. He also mentioned that the release of the OLMoE model is part of AI2's goal of developing open source models and making its performance comparable to closed models.
In terms of model building, AI2 decided to use 64 small experts for fine routing and activate only eight of them at runtime. Experiments show that OLMoE is comparable to other models in performance, but is significantly reduced in inference cost and memory storage. OLMoE also builds on the open source model OLMO1.7-7B before AI2, supporting 4096 tagged context windows. OLMoE training data comes from multiple sources, including Common Crawl, Dolma CC, Wikipedia, etc.
In the benchmark, OLMoE-1B-7B outperforms many existing models when compared with models with similar parameters, and even surpasses larger models such as Llama2-13B-Chat and DeepSeekMoE-16B.
One of the goals of AI2 is to provide researchers with more fully open source AI models, including hybrid expert architectures. Although many developers use MoE architecture, AI2 believes that most other AI models are far from open enough.
huggingface: https://huggingface.co/collections/allenai/olmoe-66cf678c047657a30c8cd3da
Paper entrance: https://arxiv.org/abs/2409.02060
Key points:
- The new open source model OLMoE released by AI2 is competitive in terms of performance and cost.
- OLMoE adopts a sparse hybrid expert architecture that can effectively reduce inference costs and memory requirements.
- AI2 is committed to providing a fully open source AI model that promotes academic research and development.
In short, the open source release of OLMoE is a major advancement in the field of large-scale language models. Its high performance, low cost and fully open source characteristics will greatly promote further research and application in the academic and industrial circles. AI2's move reflects its firm commitment to the development of open source AI and is worth looking forward to more similar contributions in the future.