Authored By: Anup Gosavi, CEO, Spext
Voice as a medium is growing exponentially with the presence of voice assistants such as Alexa, Google Home and now, Facebook Portal. The demand for voice content, both in audio and video formats has been booming in the past 5 years, especially in video webinars, interviews and online training courses. Even podcasts, a relatively new medium, produce 4 lakh hours of episodes every month.
This demand is expected to exponentially rise in the next 5 years and to meet this demand, a number of startups are relying on developments in artificial intelligence and speech algorithms. Let us take a look at five brands that are redefining the voice industry in across the globe.
Storyline: Storyline makes it easy for anyone to create skills/ apps for Alexa without learning how to code. It features a simple, drag and drop interface that non-technical people can use to create voice-based games, trivia or briefings like daily News. It is the most popular Skills creator platform and powers more than 6% of skills available on Alexa.
LyreBird: Named after the bird that can mimic human voice, Lyrebird is a Canadian company that does speech synthesis, essentially converting text into human-sounding sound. LyreBird allows you to create entire sentences or conversations in a new voice. It is what we call TTS (Text to Speech). While there are obvious ways this can be misused, LyreBird claims to have security features built in so that only authorized sounds can be changed.
Voicery: Voicery is a Y Combinator-backed startup that can create ultra-realistic, human sounding synthetic voices. The person has to record a few minutes of audio and Voicery’s deep learning algorithms can create a realistic sounding human sound.
Otter.ai: Otter is a note-taking and collaboration app that business people, students, and journalists can use to get more value from meetings. It records meetings and converts it to text automatically in real time. Once converted to text, you can search, edit and share conversations with your team-mates. A lot of valuable content get lost in voice recordings. With Otter, organizations and teams make sure the content is searchable and accessible across the entire organization.
Spext: Spext, a Bangalore based company, wants to make interacting with voice media as easy as text. It converts voice to text automatically and then syncs the transcript with the spoken words accurately. That means when you can delete a sentence in the transcript, it deletes the corresponding section of the media. It is much easier to use than traditional waveform based editing software. What’s more amazing is that you can type in new words in the transcript and change the spoken words in the voiceover. It is like Photoshop For Voice & while there are obvious ethical issues, the technology can be very valuable for correcting mistakes in recordings or personalizing voice content at scale.