Hugging Face released the stunning lightweight visual language model SmolVLM, which is small in size and can run on small devices such as mobile phones, but its performance exceeds the 300-fold larger Idefics80B model. This breakthrough progress marks the advancement of AI applications towards a wider and lower-cost deployment era, saving enterprises a lot of computing costs and improving processing efficiency. The emergence of SmolVLM provides an unprecedented opportunity for small businesses and startups to quickly develop complex computer vision applications at a lower cost.
Hugging Face has launched a remarkable AI model - SmolVLM. This visual language model is small enough to run on small devices such as mobile phones and outperforms the predecessors that require support from large data centers.
The GPU memory requirement of the SmolVLM-256M model is less than 1GB, but its performance exceeds its predecessor Idefics80B model, which is 300 times larger than its size, marking a significant advance in practical AI deployment.
According to Andres Malafiotti, a machine learning research engineer at Hugging Face, the SmolVLM model is also bringing significant computing cost reductions to enterprises while it is being introduced to the market. "The Idefics80B we previously released was the first open source video language model in August 2023, while the launch of SmolVLM achieved a 300-fold reduction in size and performance improvement." Malafioti accepted the Entrepreneur Daily said during the interview.
The launch of the SmolVLM model coincides with a critical moment when enterprises face high computing costs in implementing AI systems. The new model includes two parameter scales, 256M and 500M, allowing images and visual content to be processed at speeds previously unthinkable. The minimum version can process up to 16 instances per second and requires only 15GB of memory, making it especially suitable for businesses that need to process large amounts of visual data. For mid-sized companies that process 1 million pictures per month, this means considerable annual computational cost savings.
In addition, IBM has also reached a partnership with Hugging Face to integrate the 256M model into its document processing software Docling. Although IBM has abundant computing resources, using smaller models makes it efficient in processing millions of files at a lower cost.
The Hugging Face team successfully reduced model size without losing performance through technological innovations in visual processing and language components. They replaced the original 400M parameter visual encoder with a 93M parameter version and implemented a more aggressive token compression technology. These innovations allow small businesses and startups to launch complex computer vision products in a short period of time, and the infrastructure costs are significantly reduced.
SmolVLM's training dataset contains 170 million training examples, nearly half of which are used for document processing and image annotation. These developments not only reduce costs, but also bring new application possibilities to enterprises, increasing their capabilities in visual search to an unprecedented level.
This advance by Hugging Face challenges traditional perceptions of the relationship between model size and capability. SmolVLM proves that small and efficient architectures can also achieve excellent performance. In the future, the development of AI may no longer be to pursue larger models, but to pursue more flexible and efficient systems.
Model: https://huggingface.co/blog/smolervlm
Points:
The SmolVLM model launched by Hugging Face can run on mobile phones and has a performance of more than 300 times larger than the Idefics80B model.
The SmolVLM model helps enterprises significantly reduce computing costs, with processing speeds of 16 instances per second.
The technological innovations of this model allow small businesses and startups to launch complex computer vision products in a short time.
The emergence of SmolVLM indicates that AI applications will become more popular, and small businesses and individual developers can easily utilize powerful AI technology to promote the innovation and development of artificial intelligence in more fields. Its lightweight and high-performance characteristics will undoubtedly change our understanding of artificial intelligence models and point out a new path for the future development direction of AI technology.