Will AI voice shake the audio production industry as we know it?

,
AI Tech

Imagine you are listening to an audiobook you just downloaded. At first, everything sounds as it should be. An impactful voice narrates the intro as you would expect in any high-end audio production. You are eager to listen to more of your favorite book.

Further, into the narration, something jars your ears that just doesn’t sit right. The narrator pronounces some names incorrectly. OK, maybe it was just a mistake and the audio proofers missed those couple of mispronunciations. You dismiss it and keep listening.

The narrator continues and the gaps start to be more inconsistent with the infliction of some words wavering. Understandable, maybe the narrator didn’t get directed to inflict some words and it slipped by. You dismiss those inconsistencies and mistakes yet again.

Continuing, the cadence of the narrator’s voice is smooth at first, then it snags on syllable yet again taking you away from the audiobook. OK, now this is getting a little distracting now. What is going on?

All of a sudden you notice more and more audio artifacts in the voice that you can not ignore anymore. Why does this audio have so many issues? It sounds like it’s digital. That’s because it is an AI voice! Cue the dramatic music.



AI Voice NarrationIt’s as we have always thought, robots and AI are going to take over everything. Well, the truth is that AI has been here for a while and you just haven’t noticed until now. AI has been around since its infancy with the first computers in the 1950s. By the late 1990s personal computers were showing up in business offices within the home.

Fast forward to the dot com era between 1998 and 2000 when the internet boomed. We didn’t see these computers as AI and intelligence at the time. Computers were machines that helped us do our jobs faster and more efficiently. They were tools to better our lives and blended in the background of everyday tasks.

The Future of AI Unveiled In The Early 90’s And We Didn’t Notice

I remember watching a news segment in the early 90s. It was showcasing the idea that the computer would be as small as a pocket calculator that could fit in the palm of your hand. It would be as powerful as the modern computer on the market at the time. The newscaster explained that it might even be able to make phone calls and guide you with navigation. No way! Now that seemed like a sci-fi movie to me and I will admit I even chuckled when I heard it.

By 1993 the first touchscreen phone came out called the IBM Simon. We may only remember the iPhone which first came out in 2007. This was a big shift to where we are today as we hold AI in the palm of our and on a daily basis.

Speed up to the present and now it seems that computers and AI have evolved from assisting us to thinking on their own. AI is learning, thinking and quickly advancing at a fast rate. Welcome to the future in the present.

AI is now used in many different areas such as healthcare, finance, marketing, and robotics. As technology continues to advance, the potential applications for AI will only increase over time. Now AI has advanced into the entertainment and audio-visual production industry.

AI And The Audio-Visual Industry

With movies, AI has actually been around for some time now. The first CGI rendering technology changed the movie industry forever and paved the way for modern filmmaking when it was fully featured in the movie Star Wars in 1977. The first 100% feature film totally CGI generated was Tron in 1982 which was based on AI technology. Now, almost every action film presently has CGI-generated scenes and we don’t even notice it. Unless the CGI is really bad.

Fast forward to today and we see the rise of AI in audio production as it speeds up at a drastic rate. AI voice have been getting better and better in the last few years. No longer is an AI voice a creepy monotone computer-sounding replica of the human voice. Now it sounds much more realistic and believable. It is not yet perfect and you can still tell an AI voice if you listen closely, but it’s starting to get really good. So, good it’s almost scary.

AI-generated text-to-voice is now a staple for website articles. You just press a play button on an article and it reads to you with a natural-sounding human voice. It’s not perfect, but I will admit I use it on my website articles as it’s just so convenient for users. I could foresee this technology being used for audiobooks and narration as it evolved. Well, here we are, it is happening.

How Will AI Voice Change Audio Production As We Know It?

In 3 to 5 years the AI-generated voice will only get better, to a point we might not even notice it’s digital. With that said, what does that mean for the audio production industry and the professionals who rely on recording and producing for a living? Will these jobs be obsolete while we find our livelihood dismantled? I think the answer is yes and no.

There is no doubt the audio production industry will change drastically in the next 5 years with AI, to a point we may not recognize it. But, that is what everyone said about the digital recording revolution 20 years ago. It didn’t kill the industry but evolved it into a new platform and revamped the recording studio as we know it. It opened new doors to the masses which were never reachable in the past. I feel the AI era will be the same scenario.

We will see AI move production from the physical hands-on realm to the virtual. This has already happened in the last 10 years with the DAW, audio plugins, and “virtual instruments”. Analog instruments such as guitars, drums, strings, and bass moved to the virtual world of software-based instrument emulation. As well as audio hardware emulation software.

In the beginning stage, these virtual instruments sounded nothing like their analog counterparts. Kind of where the AI voice is now. Today virtual instruments are almost a staple in all modern music and sound very realistic. From film scores to mainstream music, AI software-based instruments sound like the real thing. The same will happen for AI voice.

For AI in music and audio, I see the same thing happening for jobs and professions. The jobs will adapt to these new technologies. Will the professions be the same, the answer is probably no. I think you will see less one-on-one interaction with clients and more remote production assistance. New professions will be created to operate AI software and facilitate the work. Of course, with more accessibility, more unprofessional work and media will be out there. There will still be a place for professionals who know how to use the technology correctly.

Adapt To Technology Or Be Left In The Dust

As of 2021 after the pandemic, 95% of all my work and clientele come online which is facilitated remotely. My business is almost totally virtual and online. I had to evolve my business from a brick-and-mortar production studio to a fully executed online framework. After 2020 we all had to adapt and migrate to online business.

After migrating everything to an automated online platform, I saw my business skyrocket compared to the previous years. How is my online business workflow automated? You guessed it, with AI.

Did the new AI technology along with the pandemic kill my business? No, I would say it was the total opposite. But, if I hadn’t taken the time to foresee the changes that were about to happen, I would have been left in the dust. And eventually, close up shop. Change is inevitable in your business and life. The question is how do you stay relevant?

Book Authors Will Still Want to Narrate Their Audiobooks

Many of the audiobooks I edit and master presently are from authors who have narrated their own books. I see something special about listening to an audiobook narrated by the author. It just has a personal touch to it that you can’t get with a generic AI voice at the moment.

With this aspect, I feel recording, editing, proofing, and mastering an audiobook professionally will still be required by self-narrated authors. At least for the next 5 years or so. I say this because in the future Authors will be able to synthesize their voice to AI and generate it to render their own audiobooks. The AI probably won’t read exactly as the Author would, but close enough. This would be helpful for Authors that have many previously published books not yet in audiobook format.

How Will AI Help Enhance Traditional Voice Narration Workflow?

New AI software is being developed which will change the film and audio production industry as we know it. In the past, if you wanted to release a film in different languages for example, you had to hire a voice talent in that language and record ADR for the entire movie. This is a lengthy process not to mention added cost to the film.

With AI voice technology, you could synthesize an actor’s voice essentially making a digital representation and using it to read a translated script for a movie. With CGI you will be able to composite a mouth over the actor matching the lips perfectly in the new language. Essentially, you would have the actor’s original voice and lips speaking in a different language than how the film was originally produced. This would almost elevate having the actor record ADR in the studio. This is kind of scary to think about, but also very interesting.