Captioning, Dubbing or Lip Syncing. Which one do you need?

When distributing video content globally, choosing the right localization technique—captioning, dubbing, or lip syncing—is crucial for engaging international audiences. Each method has its unique advantages and can dramatically affect how your content is received.

Captioning offers a cost-effective way to provide accessible translations, while dubbing replaces the original audio with voice-overs in another language, enhancing viewer immersion. Lip syncing goes a step further, meticulously aligning the audio in the target language with the speakers’ lip movements for a seamless viewing experience. This blog post will help you understand these options better and decide which one best suits your project’s needs.

In a nutshell, here are short descriptions of each one:

Captioning: Displays translated text on screen, aiding accessibility and understanding without altering original audio; ideal for diverse audiences.

Dubbing: Replaces original speech with voice-overs in the target language, maintaining narrative flow and emotional resonance for native viewers.

Lip Syncing: Syncs dubbed audio to match speaker’s lip movements, creating an illusion that characters are naturally speaking the target language.

Elon Musk Lip Syncs Elton John Song, implemented by AI technology

Dubbing on the big screen

Dubbing is hardly a new technique. Anyone who has watched a 1960’s spaghetti western like Sergio Leone‘s The Good, the Bad and the Ugly has seen dubbing used in movies. With the exception of the star actors like Clint Eastwood and Lee Van Cleef, most of the actors were speaking Italian. Their scenes were dubbed by English-speaking voice actors.

Dubbing in movies was a more labor-intensive and meticulous process compared to today’s digital techniques. How was this done?

Translation and Adaptation: Scripts were first translated from the original language to the target language. The translation had to maintain the dialogue’s length and rhythm to match the actors’ mouth movements as closely as possible.
Voice Casting: Voice actors were carefully selected not just for their vocal qualities, but also for their ability to match the on-screen actor’s voice, intonation, and acting style.
Rehearsals: Voice actors often rehearsed their lines while watching the film to better synchronize their timing with the lip movements and actions of the characters.
Recording: Unlike today’s digital recording, audio was recorded on magnetic tape. Voice actors performed in sound booths, often watching the film scene repeatedly, delivering their lines in sync with the characters’ mouth movements. This required multiple takes to achieve perfect timing.
Mixing and Editing: The recorded audio was then mixed, edited, and manually synchronized with the film’s original sound track. Sound engineers played a crucial role in adjusting the volume and clarity of the dubbed voices to blend smoothly with the film’s original background noise and music.
Final Review: The dubbed film was reviewed, and any sections not perfectly synchronized could lead to additional re-recording.

This manual and repetitive process required significant skill and patience, contributing to the unique character of dubbed films from that era.

Dubbing in 2024

Today, dubbing movies and TV shows is much more technologically advanced and efficient compared to the processes used in the 1960s. Here’s how it’s typically done now:

Translation and Script Adaptation: Similar to earlier methods, the original script is translated into the target language. However, today’s script adapters use software to ensure the translated dialogue matches the lip movements and screen duration as closely as possible, while also retaining the context and cultural nuances.
Voice Casting: The process of selecting voice actors remains crucial, but now casting can be done globally thanks to digital communication. Directors often look for voice actors who can match the original actors’ emotional tones as well as their vocal characteristics.
Digital Workstations: Voice actors perform in soundproof booths, but unlike in the past, they now have digital scripts and the video playing on a screen in front of them. Software tools allow them to see the waveform of the original audio, helping them match the timing and intonation more precisely.
Synchronization Software: Advanced software helps synchronize the dubbed voice with the actors’ lip movements. Tools like Pro Tools or Logic Pro provide visual aids for ensuring lip-sync accuracy, reducing the need for numerous takes.
Recording: With digital technology, recording is more flexible and can be adjusted on the fly. Multiple takes can be easily edited, mixed, and matched without the physical limitations of tape.
Mixing and Mastering: The final mix includes not just the voice but also the ambient sounds and music. Sound engineers use digital audio workstations to blend these elements seamlessly, ensuring that the dub feels natural and maintains the integrity of the original soundtrack.
Quality Control: Before final approval, the dubbed version undergoes rigorous quality control to check for errors in synchronization, translation, and overall sound quality, ensuring the dubbed version meets the production standards.

These technological advancements have significantly improved the speed, efficiency, and quality of the dubbing process, allowing studios to release multilingual versions of films and shows more rapidly to a global audience.

Why is captioning used more than dubbing in corporate videos?

Generally, dubbing tends to be more expensive than captioning. Here’s why:

Voice Talent Costs: Dubbing may require hiring multiple voiceovers, as most corporate videos feature a wide range of speakers, male and female and such. This involves paying multiple voiceover people for their sessions, which can become quite costly, especially for high-quality talent.
Technical Resources and Studio Time: Dubbing may require recording sessions in sound studios. This involves not just the rental of the space but also using sophisticated recording equipment and employing sound engineers and technicians to manage the recording process and ensure quality.
Synchronization and Post-Production: Dubbing must be precisely synchronized with the actors’ lip movements and the video’s timing. This process requires more advanced editing and sound engineering, including adjusting the dubbed audio to fit the video seamlessly, which can be time-consuming and requires specialized software and skills.
Quality Control and Revisions: Ensuring the dubbed version maintains the emotional and contextual integrity of the original often necessitates multiple revisions. This quality control process can add to the cost, as it might involve re-recording certain parts to get them just right.

On the other hand, video captioning and subtitling involves fewer steps and less expensive resources:

Transcription and Translation: The process involves transcribing the original audio and then translating the text, which, while requiring skilled translators, is much less costly than hiring voice talent.
Timing and Placement: Captions must be correctly timed to match the spoken words and placed on the screen without obstructing important visual elements. However, the software used for this is typically less costly compared to dubbing equipment.
Simpler Production: The process doesn’t require expensive studio time or extensive post-production audio editing, which significantly reduces costs.

Thus, for projects where budget constraints are a consideration, captioning often presents a more cost-effective alternative to dubbing.

How AI is changing the media localization landscape

The emergence of AI, together with its subset of machine translation (MT), have given rise to a new generation of products that combine translation, dubbing and lip syncing under one umbrella.

Adobe has announced that it is actively developing this technology (see video below). And with Adobe’s leadership and size, they are certainly positioning to be a major player in this area.

Other products include Rask AI, Dubverse and LipDub AI. Many new startup companies offer online solutions and affordable monthly licenses. As with any new technology, there are still bugs that need to be ironed out. And some of the players will peter out. But this area shows promise and will certainly impact the world of film and media in the near future.