React / Vanilla Speech Highlight is a powerful library for integrating text-to-speech and real-time word/sentence highlighting into your web applications. It supports audio files, the Text-to-Speech API, and the Web Speech Synthesis API, making it ideal for creating interactive, accessible, and dynamic user experiences.
? Try the Demo: React Speech Highlight
We support implementation using vanilla js. this package has bundle size of 45 KB. You can easily combine this library with your website, maybe your website using jquery
Read the API_VANILLA.md to see the different.
Try the demo Vanilla Speech Highlight
Watch Youtube Video about implementation vanilla speech highlight for javascript text to speech task.
Built with react native cli. Try the demo android app
Do you want other implementation? just ask me via discord: albirrkarim
This is the Documentation for web version
Table Of Contents
Recently, I want to implement the text-to-speech with highlight the word and sentence that are being spoken on my website.
Then i do search on the internet. but i can't find the npm package to solve all TTS problems
I just want some powerfull package that flexible and good voice quality.
Overall the text to speech task comes with problems (See the detail on PROBLEMS.md) whether using web speech synthesis or the audio file.
Using Web SpeechSynthesis
They have problems Robot like sound, Supported Devices Available, etc..
Using paid subscription text-to-speech synthesis API
When we talk about good sound / human like voices AI models inference should get involved. So it doesn't make sense if doing that on client side.
Then the speech synthesis API provider like ElevenLabs, Murf AI, Open AI, Amazon Polly, and Google Cloud play their roles.
But they don't provide the npm package to do highlighting.
Then i found Speechify. but i don't find any docs about using some npm package that integrate with their service. Also this is a paid subscriptions services.
Searching again, Then i found ElevenLabs its free if the 10000 character / month and will reset on next month. Cool right? So i decide to use this as speech synthesis API in my project. This platform also doesn't provide the react npm package to highlight their audio, but they provide streaming output audio that can be use to produce "when the words is spoken in some audio" (transcript timestamp) like someone make about this thing.
In production you must do cost calculation, which TTS Service API provider you should choose. The services that have capability streaming audio is promising highlight word. but also comes with high price. The cheap TTS service API usually don't have much features.
The elevenlabs have produce good quality voice and many features, but when comes for production they more expensive compares with Open AI TTS, In production the cost is important matter.
So, I decide to making this npm package that combines various methods above to achives all the good things and throw the bad things. All logic is done in client side, look at the overview above. No need to use advanced backend hosting.
My package combines Built in Web SpeechSynthesis and Audio File (optional) to run.
When using prefer/fallback to audio file you can achive high quality sound and remove all compactbility problem from Built in Web SpeechSynthesis.
How you can automatically get the audio file of some text ? you can use ElevenLabs,Murf AI,Open AI, Amazon Polly, and Google Cloud or any other TTS API as long as they can produce audio file (mp3, mp4, wav, etc...) for the detail see the AUDIO_FILE.md. In the demo website i provide you example using ElevenLabs and even you can try your own audio file on that demo web.
This package just take input text and audio file, so you can flexible to use any TTS API that can produce audio file, The expensive one or even cheap one when you consider the cost.
How this package know the timing spoken word or sentence of played audio? This package can detect the spoken word and sentence in client side.
This package is one time pay. No Subscription. Who likes subscription? I also don't. see the how to purchase bellow.
When you are entrepreneur im sure you have some crazy uses case for this package.
Interactive Blog
Imagine that you have long article and have TTS button then played the text to speech and users can see how far the article has been read. you article will be SEO ready because this package has Server Side Rendering (SSR) capability.
Web AI Avatar / NPC
In the demo i provide, you can see the 3D avatar from readyplayer.me can alive playing the idle
animation and their mouth can synchronize with the highlighted text to speech, it because this package has react state that represent current spoken viseme. the viseme list that i use in the demo is Oculus OVR LipSync.
Language Learning App With Real Human Voice
Look at the example 6 on the demo. its a example of use real human voice for text to speech. Maybe your local language is not supported by the TTS API. you can use this package to use the real human voice. The real human voice is recorded by the real human. The real human voice is more natural than the TTS API.
Academic Text Reader
The problem when we do TTS on academic text. it contains math equations, formula, symbol that the shown term is different with their pronounciation see. so we make some pronounciation correction engine utilizing the Open AI API to think what should the term pronounced.
Relation Highlight and word level highlighting of youtube transcript
It has youtube iframe, and the youtube transcript on the right, when you play the youtube video, the transcript will be highlighted. The highlighting is based on the current time of the played video. this package are follow the time.
Relation Highlight feature - When you hover into some word, the related word will be highlighted too. Example when you hover into chinese word, the pinyin and english word will be highlighted too and vice versa. How it can? see.
Video player with auto generate subtitle
Case: You just have audio or video file without text transcript. Our package can generate the transcript from the audio file. or even transtlate the transcript to other language. The subtitle can be highlighted when the video is played, and maybe it want to show two different language subtitle at once. and also highlight the both based on the meaning of the words.
On that preview video above the video original language is in italian, and i also show the translate in english. and the system is highlight both based on the meaning.
Italian word bella
have meaning in english beautiful
Go to this video demo page.
Your use case here
Just ask me what you want to make, the package architecture is scalable to make various feature.
See API.md and EXAMPLE_CODE.md that contain simple example code.
The full example code and implementation example is using source code from demo website. the source code of demo website is included when you buy this package.
This package is written with typescript, You don't have to read all the docs in here, because this package now support js doc and VS Code IntelliSense what is that? simply its when you hover your mouse into some variable or function VS Code will show some popup (simple tutorial) what is the function about, examples, params, etc...
Just use the source code from demo website, you can literally just understand the package.
Changelog contains information about new feature, improve accuracy, fix bug, and what you should do when the version is update.
See CHANGELOG.md
There's no refund.
I love feedback from my customers. You can write on the issue tab so when i have time i can try to solve that and deliver for the next update.
Still worry? see the reviews on producthunt
Well, i need money to funding the research, you know that making complex package is cost a lot of time and of course money.
Making the LLM engines that combines prompt engineering and efficient algorithm to saving Open AI API cost. Need to be tested and the test is repeatly that cost the API call.
Also i provide support via live private chat to me through discord (username: albirrkarim), is there any services out there doing that?
This package is a base
package that can be used for various use cases. I made a lot of money with package. The limit is your entrepreneurship skill.
With the higher price i maintain the scarcity of the functionality.
Tell your problems or difficulties to me, i will show you the way to solve that.
I provide realtime support from me with discord.
Just buy it. remove the headache. and you can focus on your project.
Yes, if you are student or teacher, you can get discount. Just show me your student card or teacher card.
Yes, if you help me vote this package on product hunt
You can see the docs in this repo, and this package is written with typescript, and tested using jest to make sure the quality.
You don't have to read all the docs in here, because this package now support VS Code IntelliSense what is that? simply its when you hover your mouse into some variable or function VS Code will show some popup (simple tutorial) what is the function about, examples, params, etc...
Just use the source code from demo website, you can literally just understand the package.
Yes it can, just ask chat gpt, and explain your problems.
Example :
"My project is using webpack, code is using jsx, i want to use tsx code along side the jsx, how can i?"
Goto the Vanilla Speech Highlight
I make demo for outputing the viseme into console.log. just open the browser console and play the prefer audio example (english). and you will see the word and viseme in the current timing of played tts.
Just see the demo
Try to use Prefer or Fallback to Audio File see AUDIO_FILE.md
or
Try to setting the speech synthesis or language in your device.
If you use smartphone (Android):
Make sure you install Speech Recognition & Synthesis
If step 1 doesn't work. Try to download google keyboard. then setting the Dictation language. wait a few minute (your device will automatically download the voice), then restart your smartphone.
Your device will download that voice first. then your device will have that voice locally.
Try to use Prefer or Fallback to Audio File see AUDIO_FILE.md
Yes, see
This package optionally required open ai API for better doing text-to-speech task (solve many problem that i wrote in PROBLEMS.md).
But if you don't want to use open ai API, it can still work. see the FAQ about What dependency this package use?
NPM dependencies:
For React Speech Highlight: See the package.json in this repo. see the peerDependencies
once you build this package you will need only npm package that is in that peerDependencies
. Only react.
For Vanilla Speech Highlight: No dependency, just use the vanilla js file.
AI dependencies:
This package optionally required open ai API for better doing text-to-speech task (solve many problem that i wrote in PROBLEMS.md).
Optionally using any TTS API that can produce audio file for better sound quality. Like ElevenLabs, Murf AI, Open AI, Amazon Polly, and Google Cloud or any other TTS API as long as they can produce audio file (mp3, mp4, wav, etc...) for the detail see the AUDIO_FILE.md.
Yes, See the detail on TEST.md
or you can Try to use Prefer or Fallback to Audio File see AUDIO_FILE.md
It just work. Simple explanation is in the introduction above.
The architecture scalable, just ask me what feature you want.
See LLM_ENGINE.md
No, Because my package handle all the batching system, pronounciation system, and providing text so the TTS API can produce the audio file that can be used for highlighting.
You can just do caching strategy to cache the request response. for both open ai API and TTS API for audio file.
For individual developer, freelancer, or small business.
The price is USD $200. Too expensive? See the demo website maybe theres a discount for you or fill this form you you get notified when theres an offers.
After payment, you’ll be invited to my private repository, where you’ll have access for one year, including all updates during that time.
For continued access in subsequent years, you can pay USD $50 annually to remain in the private repository.
What you got
The demo website (Next js based)
The package repo (React Speech Highlight)
The package repo (Vanilla Speech Highlight)
I know this package is complex, some features requiring architecture & advanced programing skill to use it.
So i make some full screencast tutorial about how to use this kind of advanced weapon.
From the installation to examples of advanced implementation and more.
The price is subscription $5 / month. (Coming soon)
For you that already have business and want solid package that can be used for long term.
The price is USD $700.
What you got
The price is USD $150.
What you got
Contains: YouTube relation transcript highlight, Video auto-generate transcript, Streaming TTS
Contains: Backenify LLM engines
React GPT Web Guide ($100) + React Speech Highlight($200)($50) = $150
What you got
I accept various payment method:
Github Sponsors
Choose One Time Tab, Select the option, and follow the next instruction from github.
So this package is the answer for you who looking for: