Ultra-fast text-to-speech model Lightning: ultra-low latency, 100 milliseconds to generate 10 seconds of audio

Author：Eve Cole Update Time：2024-11-29 14:41:01

The editor of Downcodes learned that the American AI startup smallest.ai has launched a new text-to-speech (TTS) model Lightning. Its speed is amazing: it only takes 100 milliseconds to generate audio of up to 10 seconds! This marks a major leap in TTS technology, which will greatly reduce the cost of voice robot development and application, improve accessibility, and bring good news to developers around the world. Lightning supports multiple accents in English and Hindi, and will support more languages in the future, and offers extremely competitive pricing: only $0.02 per minute.

Recently, smallest.ai, an AI startup headquartered in San Francisco, California, has launched its new product Lightning, a text-to-speech (TTS) model that can generate up to 10 seconds of audio in 100 milliseconds. The advancement of this technology enables developers around the world to build highly realistic voice robot applications with extremely short latency, reducing implementation costs and improving application accessibility.

Lightning currently supports multiple accents in English and Hindi, and the team plans to quickly add more languages to meet market demand. This model is priced at just US$0.02 (approximately INR 1.6) per minute, providing voice bot developers with a highly cost-effective solution, with the running cost of the application being controlled at less than INR 1 per minute. Reduces the cost of building voice robots while expanding market accessibility.

Unlike the traditional TTS model that relies on streaming media and network sockets, which increases server burden and complicates scalability, Lightning uses a simple REST API design to deliver audio in about 100 milliseconds, avoiding the problems caused by continuous streaming. Server pressure. This fast processing power and cost efficiency make it a significant alternative in the voice robot industry.

Lightning’s product features can be summarized as follows:

1. Speed and efficiency. Known as the world's fastest text-to-speech, the Lightning model generates 10 seconds of ultra-realistic audio in 100 milliseconds, achieving real-time speech synthesis to meet the need for rapid response.

2. Compactness and compatibility. Requiring less than 1GB of video memory, the model is small and can easily run on most consumer and edge devices, reducing hardware requirements.

3. Multi-language support. Multi-language and accent support, currently supports multiple accents in English and Hindi, and plans to quickly add more languages to meet the needs of global users.

4. Highly customizable. Style diffuser uses a special style diffuser to adjust the audio style according to user needs, making the generated speech more natural and emotional.

5. Easy integration. REST API integration provides a simple REST API interface, allowing developers to quickly integrate the lightning model into existing systems, eliminating the need for complex WebSocket connections.

6. Friendly pricing, starting at US$0.04 per minute, suitable for all types of enterprises, and customized pricing plans are provided for enterprises with large usage volumes.

smallest.ai was founded by IIT Guwahati alumni Sudarshan Kamath and Akshat Mandloi. Kamath said smallest.ai’s low-price strategy is driven by their focus on data quality and model efficiency. “Our model is much smaller than competitors such as ElevenLabs, but we achieve high-quality speech output through highly refined data,” he explained.

Voice bot developers who gained early access to Lightning reported an 8x reduction in operating costs while improving audio quality. In addition to real-time voice bot applications, Lightning can also be used to create voiceovers for audiobooks and social media content on platforms such as Instagram and YouTube. Non-developers can also access Lightning through the Waves Speech platform and experience features such as voice cloning and accent conversion, which are currently in beta.

In an exclusive interaction with Analytical India Magazine, Kamath said: "When we started building, we realized that the existing models required for voice bots were not mature enough for Indian languages. Existing models for non-English languages were simply not up to production Require."

In June this year, smallest.ai also launched the AWAAZ model, which supports voice cloning through short audio clips at a competitive price. This model is designed to meet scalable applications in regional language markets and provide enterprise-grade security and compliance. When asked about its mission, Kamath said: "Why are a billion people not communicating with an AI voice on a daily basis, despite huge advances in voice AI technology? This is the question we strive to solve."

Project entrance: https://smallest.ai/blog/lightning-fast-text-to-speech

The emergence of the Lightning model undoubtedly sets a new benchmark for speech synthesis technology. Its high efficiency, low cost and easy integration will promote the popularity and innovation of voice robot applications and bring new opportunities to more developers and enterprises. The editor of Downcodes hopes that Lightning will support more languages and functions in the future, bringing a more convenient and better voice experience to users around the world.