The strong performance of Apple's M4 chip is driving the rapid development of local artificial intelligence. Exo Labs cleverly used multiple Mac devices equipped with M4 chips to build a low-cost, high-performance local AI computing cluster and successfully ran multiple large language models (LLMs), which brought more economical benefits to individuals and enterprises. , more private artificial intelligence solutions. The editor of Downcodes will give you an in-depth understanding of this breakthrough progress.
In the field of generative artificial intelligence, Apple’s efforts seem to be mainly focused on mobile devices, especially the latest iOS18 system. However, the new Apple M4 chip has demonstrated powerful performance in the newly released Mac Mini and Macbook Pro, allowing it to effectively run the most powerful open source basic large language models (LLMs) currently available, such as Meta's Llama-3.1405B, Nvidia's Nemotron70B and Qwen2.5Coder-32B.
Exo Labs is a startup founded in March 2024 that is committed to "democratizing access to artificial intelligence." Its co-founder Alex Cheema has successfully built one using multiple M4 devices A local computing cluster.
He connected four Mac Mini M4s (each priced at $599) to a Macbook Pro M4Max (priced at $1,599), running Alibaba's Qwen2.5Coder-32B through Exo's open source software. The entire cluster costs approximately US$5,000, which is extremely cost-effective compared to an Nvidia H100 GPU worth US$25,000 to US$30,000.
The benefits of using a local computing cluster instead of a network service are clear. By running AI models on devices controlled by users or enterprises, costs can be effectively reduced while improving privacy and security. Chima said that Exo Labs is constantly improving its enterprise-level software. Several companies are currently using Exo software for local AI reasoning. In the future, this trend will gradually expand to individuals and enterprises.
Exo Labs' recent success is due to the powerful performance of the M4 chip, which is billed as "the world's fastest GPU core."
Qima revealed that Exo Labs' Mac Mini M4 cluster is capable of running Qwen2.5Coder32B at 18 marks per second and Nemotron-70B at 8 marks per second. This shows that users can efficiently handle AI training and inference tasks without relying on cloud infrastructure, making AI more accessible to consumers and enterprises that are privacy- and cost-sensitive.
To further support this wave of local AI innovation, Exo Labs plans to launch a free benchmarking website to provide detailed hardware configuration comparisons to help users choose the best LLM running solution based on their needs and budget.
Project entrance: https://github.com/exo-explore/exo
The successful case of Exo Labs demonstrates the huge potential of Apple's M4 chip in local AI applications, and also indicates that individuals and enterprises will have a more convenient, economical, and private AI experience in the future. This will further promote the popularization and application of artificial intelligence technology and bring more innovation opportunities to all walks of life. Look forward to more surprises from Exo Labs in the future!