Text to speech with Javascript

Text-to-speech (TTS) is an assistive technology that has gained much popularity over recent years. It is also referred to as 'read aloud' technology because TTS pronounces all the written words.

TTS has been incorporated into many websites, apps, and digital devices. It is a notable alternative to plain text, extending the reach of content and broadening the audience. Today, TTS grew beyond an alternative for text. It now gained - among other functions - educational purposes.
Though written text still reigns supreme in classical teachings, TTS’s popularity is largely based on its advantages over static text:

  • Helps people with reading difficulties
  • Convenience
  • Complements alternative learning styles
  • Accessibility to read text aloud

How Text to Speech works

Most TTS functionalities are inbuilt, seen in browsers, apps, and various pieces of software about text-to-speech. For example, Google Docs has an accessibility setting where readers have the option to 'Turn on screen reader support'.
You can download certain pieces of TTS software to your device or enable it on a browser page on demand. This method works primarily for pages or apps without an inbuilt TTS.
TTS applies in various forms. It highlights words as they go over them. Convenient options like start, stop, pause and cancel, give you, as a reader, exclusive control over how it aids you. Additionally, you also can switch between a list of male and female reading voices.

For the sake of this article, we will look at text-to-speech API on websites using JavaScript.

Why JavaScript?

JavaScript is a modern programming language that extensively participates in all web-related technology solutions. It is also called the language of the web.
JavaScript, fused with HTML5 has a broad reach of DOMs and APIs. This synergy makes it easier for writing functionalities into a website, including a text-to-speech functionality powered by a Web Speech API.

Web Speech API

Web Speech API allows us to incorporate voice data or speech into web apps. It has two distinct functionalities – Speech Synthesis (text-to-speech) and Speech Recognition.

Speech Synthesis is the synthesizer that allows apps to read text aloud from a device or app. It is the control interface of the Web Speech API text-to-speech service.

Speech recognition is different from text-to-speech. In TTS the program reads the text for you, while speech recognition allows you to interface with your application using direct voice commands.

Getting Started with SpeechSynthesis

The SpeechSynthesis functionality is a robust controller with properties and methods that regulate the precise method for text conversion into speech.
To convert text-to-speech, we only need to create an instance of the SpeechSynthesisUtterance() class and configure it with the properties and methods attached to it.

let speech = new SpeechSynthesisUtterance(); 

SpeechSynthesis has six properties, they include,

  • language: This gets and sets the language of the utterance.
  • pitch: Sets the pitch of the utterance. It ranges from 0 – 2 (0 is the lowest and 2 - the highest). We can adjust it using a slider.
  • rate: Sets the rate of the utterance. The rate ranges from 0.1 to 10 (0.1 is the lowest and 10 is the highest). Visually, we can set it using a slider.
  • volume: Sets the volume of the utterance. The volume ranges from 0 to 1 (0 is the lowest value, and 1 - the highest. We will set it visually using a slider.
  • text: Gets and sets the text for synthesizing.
  • voices: Sets the speaking voice.

SpeechSynthesis takes methods like these:

.cancel(): Like stop; it removes all the utterances from the utterance queue
.getvoices(): Gets the voices available on the Web Speech API synthesizer
.pause(): Pauses an utterance
.resume(): Fired when an utterance is paused
.speak(): Reads an utterance aloud

To simply convert a text to speech, use:

<script> 
let speaknow = new SpeechSynthesisUtterance('Hello world!'); 
window.speechSynthesis.speak(speaknow); 
</script>

Since not all browsers support the API, we do a check for this:

<html> 
<body>
  <button onclick="play()">Play</button> 
</body> 
</html> 

<script>
function play() { 
  if ('speechSynthesis' in window) { 
    let working = new SpeechSynthesisUtterance("This is working"); 
    window.speechSynthesis.speak(working); 
  } 
  else{ 
    document.write("Browser not supported") 
  } 
} 
</script> 

Next, we will create a simple demo with HTML, CSS, and JS to show how you can implement Web Speech API in browsers and websites.

<html> 
<head> 
  <meta charset="utf-8" /> 
  <meta http-equiv="X-UA-Compatible" content="IE=edge"> 
  <title>Web API TTS</title> 
  <meta name="viewport" content="width=device-width, initial-scale=1"> 
</head> 
<body> 
  <div> 
    <h3> Select Voices </h3> 
    <select id="voices"> 
      <option> option 1 </option> 
    </select> 
  </div> 
   
  <div id="vpr"> 
    <h5> Volume </h5> 
    <input type="range" min="0" max="1" value="0.5" step="0.1" id="volume" /> 
    <span id="vol-label">1</span> 
     
    <h5> Rate </h5> 
    <input type="range" min="0" max="10" value="0.5" step="0.1" id="rate" /> 
    <span id="rate-lab">1</span> 
     
    <h5> Pitch </h5> 
    <input type="range" min="0" max="2" value="1" step="0.1" id="pitch" /> 
    <span id="pitch-lab">0.5</span> 
  </div> 
   
  <textarea rows="9" cols="60" name="description" id="lines">Enter text here...</textarea><br> 
   
  <button class="buttons" style="background: green;" id="speak"> Speak </button> 
  <button class="buttons" style="background: orange" id="pause"> Pause </button> 
  <button class="buttons" style="background: lightgreen" id="resume"> Resume </button> 
  <button class="buttons" style="background: red" id="cancel"> Cancel </button> 
</body> 
</html>

CSS

html, body{ 
  height: 100% 
} 
select{ 
  padding: 3px; 
  margin: 10px 0; 
} 
#vpr { 
  display:inline-block; 
  padding: 30px 10px; 
} 
.buttons{ 
  display: inline-block; 
  padding: 0.6em 1.5em; 
  margin: 0 0.3em 0.3em 0; 
  border-radius: 5px; 
  box-sizing: border-box; 
  font-family: 'Roboto', sans-serif; 
  font-weight: 400; 
  font-size: 14px; 
  color: black; 
  text-align: center; 
}

JavaScript

// First we initialize new SpeechSynthesisUtterance object 
let tts = new SpeechSynthesisUtterance(); 
 
// Setting the Speech Language 
tts.lang = "en"; 
 
//Populating the select dropdown with the list of available voices on Web Speech API 
let speechvoices = []; // global array of available voices 
 
window.speechSynthesis.onvoiceschanged = () => { 
  // To get the list of voices using getVoices() function 
  speechvoices = window.speechSynthesis.getVoices(); 
  // We need to populate the section and set the first voice 
  tts.voice = speechvoices[0]; 
 
  let select_voice = document.getElementById("voices"); 
  speechvoices.forEach((voice, i) => (select_voice.options[i] = new Option(voice.name, i))); 
}; 
 
//SETTING THE CONTROLS - SPEAK, PLAY, PAUSE AND RESUME 
//SPEAK 
//first we get the value of the textarea or document 
document.getElementById("speak").addEventListener("click", () => { 
  tts.text = document.getElementById("lines").value; 
  //then we implement the speechsynthesis instance 
  window.speechSynthesis.speak(tts); 
}); 
 
//PAUSE 
document.getElementById("pause").addEventListener("click", () => { 
  // Pause the speechSynthesis instance 
  window.speechSynthesis.pause(); 
}); 
 
//RESUME 
document.getElementById("resume").addEventListener("click", () => { 
  // Resume the paused speechSynthesis instance 
  window.speechSynthesis.resume(); 
}); 
 
//CANCEL 
document.querySelector("cancel").addEventListener("click", () => { 
// Cancel the speechSynthesis instance 
  window.speechSynthesis.cancel(); 
}); 

//TO SET THE VOLUME, PITCH, AND RATE 
//Volume  
//We get the volume value from the input 
document.getDocumentById("volume").addEventListener("input", () => { 
  const vol = document.getDocumentById("volume").value; 
  // Set volume property of the SpeechSynthesisUtterance instance 
  tts.volume = vol; 
  // Updating the volume label 
  document.querySelector("#vol-label").innerHTML = vol; 
}); 

//RATE 
// We get the rate Value from the input 
document.getDocumentById("rate").addEventListener("input", () => { 
  const rate = document.getDocumentById("rate").value; 
  // Set rate property of the SpeechSynthesisUtterance instance 
  tts.rate = rate; 
  // Updating the rate label 
  document.getDocumentById("rate-lab").innerHTML = rate; 
}); 

//PITCH 
// We get the pitch Value from the input 
document.getElementById("pitch").addEventListener("input", () => { 
  const pitch = document.getElementById("pitch").value; 
  // Setting thepitch property of the SpeechSynthesisUtterance instance 
  tts.pitch = pitch; 
  // Updating the pitch label 
  document.getDocumentById("pitch-lab").innerHTML = pitch; 
});

Although we have populated the voices in the drop-down, they won't change to the selected voice unless we use the onchange function to target that.

// This changes the voice of the speaker or utterance to the selected voice 
document.getDocumentById("voices").addEventListener("change", () => { 
  tts.voice = voices[document.getDocumentById("voices").value]; 
}); 

Browser Compatibility

Web API SpeechSynthesis enjoys the full support of Chrome, Edge, Firefox, Opera, and Safari. Internet Explorer does not support this API. The onvoiceschanged() method is the only method not supported by Safari and Opera.

ResponsiveVoice JS

ResponsiveVoice is a text-to-speech API supported in over 51 languages.

ResponsiveVoice JS defines a selection of smart voice profiles. It knows which voice to enable on what device to create a consistent experience no matter where the user decides to use this functionality and is powered by LearnBrite.
To get started with ResponsiveVoice, we have to add the following line of JS to the <head> of our HTML page:

<script src="https://code.responsivevoice.org/responsivevoice.js?key=YOUR_UNIQUE_KEY"></script>

ResponsiveVoice has functions like speak(), cancel(), voicesupport(), getvoices(), isplaying(), pause(), resume() and setDefaultVoice().
The speak() method takes parameters like [string voice] and [object parameters].

To simply use the speak() function:

<script>
responsiveVoice.speak("hello world"); 
</script>

This line brings up a prompt box in the browser that asks for permission to speak.

To check for browser support, we use the voicesupport()

<script> 
if(responsiveVoice.voiceSupport()) { 
  responsiveVoice.speak("The browser supports this"); 
}
</script> 
// it returns a true or false

We will create a demo HTML page, with a text area and buttons to test out the functionalities of ResponsiveVoice and demonstrate how you can implement ResponsiveVoice with TTS on websites.

ResponsiveVoice HTML

<html> 
<head>
  <meta charset="utf-8" /> 
  <meta http-equiv="X-UA-Compatible" content="IE=edge"> 
  <title>ResponsiveVoice JS TTS</title> 
  <meta name="viewport" content="width=device-width, initial-scale=1"> 
  <script src="http://code.responsivevoice.org/responsivevoice.js"></script>   
</head> 
<body>
  <textarea rows="9" cols="60" name="description" id="lines">Enter text here...</textarea><br> 
   
  <button class="buttons" style = "background: green;" onclick = 'speak();'> Speak </button> 
  <button class="buttons" style = "background: orange" onclick = 'pause();'> Pause </button> 
  <button class="buttons" style = "background: lightgreen" onclick = 'resume();'>Resume </button> 
  <button class="buttons" style = "background: red" onclick = 'cancel();'> Cancel </button> 
</body> 
</html> 

ResponsiveVoice CSS

html, body{ 
  height: 100%
}
.buttons{ 
  display: inline-block; 
  padding: 0.6em 1.5em; 
  margin: 0 0.3em 0.3em 0; 
  border-radius: 5px; 
  box-sizing: border-box; 
  font-family: 'Roboto', sans-serif; 
  font-weight: 400; 
  font-size: 14px; 
  color: black; 
  text-align: center; 
}

ResponsiveVoice Javascript

function speak(){ 
  let speaknow = document.getElementById('lines').value 
  responsiveVoice.speak(speaknow, "UK English Male" , {rate: 1.2}, {volume: 1}, {pitch: 2}); 
} 
// we set the pitch, rate and volume all in one line with the speak function. They can be written separately. 
function pause(){ 
  responsiveVoice.pause() 
} 
function resume(){ 
  responsiveVoice.resume() 
} 
function cancel(){ 
  responsiveVoice.cancel(); 
}

Mobile devices sometimes prevent browsers from playing audio without a user gesture. ResponsiveVoice can listen for a click and take it as the user gesture required by the browser.
This ClickHook() is enabled with:

responsiveVoice.enableWindowClickHook();

If listening for a click is not possible, the responsiveVoice.clickEvent() can be called directly from any user gesture and it will grant ResponsiveVoice the required permission.

We call this event using:

responsiveVoice.clickEvent(); 

For more on ResponsiveVoice, please check here.

Additional Scope

setTextReplacement (Array Replacements)

ResponsiveVoice adds a setTextReplacements() that takes an array of words to be replaced, text to be replaced with, the voice profile, and system voices. This command is useful for specifying words or expressions with several pronunciations.

  • searchvalue: defines the text to be replaced supports regular expressions (required)
  • newvalue: the replacement text (required)
  • collectionvoices: Voice name (from ResponsiveVoice collection) for which the replacement will be applied; it can be a unique name or an array of names (optional)
  • systemvoices: Voice name (from System voices collection) for which the replacement will be applied. Can be a unique name or an array of names (optional)
responsiveVoice.setTextReplacements([{ 
  searchvalue: "man", 
  newvalue: "boy", 
}]);

The text to be replaced must be in the original text content, text area, or document. So if a text does not contain the word 'man', it would not be replaced with 'boy'.

To specify replacements only on certain voice profiles:

responsiveVoice.setTextReplacements([{ 
  searchvalue: "man", 
  newvalue: "boy", 
  collectionvoices:  "UK English Female" 
}]);

Browser Compatibility

This API is compatible with modern browsers that support HTML5.

Limitation of TTS in Javascript

  • Not available in every language
  • The speech synthesis consumes more processing power
  • Voices are emotionless and unnatural.
  • Pronunciation depends on the speaker and cannot be adjusted

Conclusion

In this article, we looked at two contemporary methods of implementing TTS in a website using JavaScript. These two methods cover the basic features of TTS. They are convenient for users and give them more control over how a simple text could be digested. You can also implement these methods with ease and build an online story reader of your own.
These are some of the basics of how to work TTS into your website, webshop or blog page. Some apps require something fancier and more extensive, and if that is the case — you may need the robust skills of a serious developer.

Author

Ivan Georgiev | Software Engineer

Ivan is an extremely intelligent and knowledgeable professional with a hunger for improvement and continuous learning. Being versatile in a number of programming languages, he picks up on new tech in a matter of hours. He is a person that masters challenges by investigation and clear, structured execution, bringing in his experience and innovative ideas to solve problems quickly and efficiently. Ivan is also a Certified Shopware Engineer.