The strong performance of Apple's M4 chip is driving innovation in local artificial intelligence computing. Exo Labs cleverly used multiple Mac devices equipped with M4 chips to build a cost-effective local AI computing cluster, successfully running multiple large open source language models (LLMs), such as Llama-3.1405B, Nemotron70B and Qwen2.5Coder- 32B. This breakthrough not only reduces the cost of AI applications, but also significantly improves data privacy and security, bringing a more convenient and secure AI experience to individuals and enterprises. This article will explore the innovative practices of Exo Labs and the important role of M4 chips in the field of local AI computing.
In the field of generative artificial intelligence, Apple's efforts seem to focus primarily on mobile devices, especially the latest iOS18 systems. However, the new Apple M4 chips show strong performance in the latest releases of Mac Mini and Macbook Pro, enabling them to effectively run the most powerful open source basic large language models (LLMs), such as Meta's Llama-3.1405B. Nvidia's Nemotron70B and Qwen2.5Coder-32B.
Exo Labs is a startup founded in March 2024 dedicated to "Distributed Artificial Intelligence Access", and its co-founder Alex Cheema has successfully built it with multiple M4 devices. A local compute cluster.
He connected four Mac Mini M4s ($599 each) to one Macbook Pro M4Max ($1599) and ran Alibaba's Qwen2.5Coder-32B through Exo's open source software. The cost of the entire cluster is about $5,000, which is extremely cost-effective compared to an Nvidia H100GPU worth $25,000 to $30,000.
The benefits of using local compute clusters rather than network services are obvious. By running AI models on a user- or enterprise-controlled device, it can effectively reduce costs while improving privacy and security. Qima said Exo Labs is constantly improving its enterprise-level software, and several companies are currently using Exo software for local AI reasoning, and this trend will gradually expand to individuals and enterprises in the future.
Exo Labs' recent success is due to the powerful performance of the M4 chip, which is known as the "world's fastest GPU core."
Qima revealed that Exo Labs' Mac Mini M4 cluster is able to run Qwen2.5Coder32B at 18 markers per second and Nemotron-70B at 8 markers per second. This shows that users can efficiently handle AI training and reasoning tasks without relying on cloud infrastructure, making AI-based privacy- and cost-sensitive consumers and businesses more accessible.
To further support this wave of local AI innovation, Exo Labs plans to launch a free benchmarking website to provide detailed hardware configuration comparisons to help users choose the best LLM operation solution based on their needs and budget.
Project entrance: https://github.com/exo-explore/exo
Points:
Exo Labs successfully runs a powerful open source AI model on local computing clusters using Apple M4 chips.
Running AI models locally reduces costs, improves privacy and security, and avoids dependence on cloud services.
Exo Labs will launch a benchmarking website to help users choose the right hardware configuration for AI tasks.
Exo Labs' success stories provide a new direction for the development of local AI computing, and also indicates that AI applications will become more popular in the future, benefiting more individuals and enterprises. The strong performance of the M4 chip and the convenience of Exo Labs open source software have jointly promoted the democratization of AI technology and deserve continuous attention.