How to Fix Common Audio/Video Sync Problems

In the early days of motion picture, audio sync was easy: There wasn’t any. When you’re dealing with silent films, you have plenty of room to play fast and loose with frame rates.

The first hand-cranked cameras used in the industry could shoot footage at rates anywhere from 16 to 18 frames per second; there was no standardization. When the finished silent movies were screened for audiences, they were often played back considerably faster than that, at rates over 20 frames per second.

This system allowed the studios to save money on film stock, and let the movie theaters earn more money by turning audiences over at a healthy clip.

But with the birth of the “talkies”, we quickly started to standardize our frame rates to make accommodations for audio. Throw sound into the picture, and all of a sudden people start to notice when Charlie Chaplin starts sounding like Mickey Mouse.

Video Frame Rates for Audio People

Even when sound was first added to picture, workflow remained fairly straightforward for a little while.

In the U.S., we began to standardize the speed of film at 24 frames per second in the mid 1920s. This allowed for smooth motion capture and reliable audio sync, and it worked nicely with the 60Hz AC frequency coming out of our power outlets.

On the consumer end, movie houses figured out that they could cut down on “flicker”, by simply flashing each of these 24 frames two times a piece for a total of 48 distinct illuminations per second.

Complications began with the advent of television. 24 frames per second may have looked pretty good in the low-light of the movie theaters, but with the greater brightness of TV sets, it caused noticeable flicker.

To combat this, 30 frames per second quickly became the U.S. standard for black-and-white video broadcast. On top of that, “interlacing”, a method of drawing each frame twice, was used to achieve a full 60 “fields” of illumination per second and cut down on flicker even further.

Converting to this new format wasn’t terribly difficult. A process called “Telecine Transfer” was invented for the U.S. market. The 24 frames of a film could be converted to 30 frames of video through “2:3 pulldown”; The frames would alternately be drawn either 2 or 3 times each – effectively stretching 4 frames of film across 5 frames of video.

Europe however, came to settle on a standard of 25fps, which made good sense for their AC frequency of 50Hz. They ended up sticking with this frame rate for both film and video, even if their TVs exhibited a bit more flicker early on.

Since the Europeans adopted a single frame rate for both formats, no conversion was necessary for their own domestic productions. But when American films were shown on TV in Europe, the stations would simply speed them up by about 4%.

This leads us to Common Sync Problem #1: Audible Pitch Shift.

Today, digital technology now allows us to speed up sound without increasing its pitch. Although this is supposed to be part of the contemporary U.S.-Europe conversion process, it’s not always done as it should be.

If you’re dealing with audio that has been converted from one of these film frame rates to the other, keep an ear open for audible pitch shift. It happens less often now than in the past, but it’s still worth listening for.

At 4% a change in pitch can be significantly noticeable in a way that a change in motion is not. This becomes even more apparent in cases where the internal camera sound is properly pitch-shifted but externally recorded audio is not.

Keep in mind that this 4% pitch drift can occur in situations where sound and picture appear to be in proper sync. Test tones, like the conventional 1kHz “2 pop” can help you evaluate sync and pitch on completed projects. When dealing with raw footage, particularly on smaller-budget foreign-market projects, you may want to refer back to the original unconverted rate, and adjust pitch if needed.

The Coming of Color

Things really started to get complicated when color entered the picture. In the U.S., the 30fps  video standard just didn’t leave enough extra bandwidth to include color information along with the picture.

The solution was almost brilliantly simple: slow down the video frame rate by 0.1%.  No one would notice the difference in picture or in pitch, but this new frame rate of 29.97 would free up enough bandwidth to include color in the broadcast.

Ironically, it’s the brilliant simplicity and transparent fidelity of this solution that leads to so many of our synchronization headaches today.

And that’s what leads us to Common Sync Problem #2: Sound Drifts Against Picture.

Pages: First | 1 | 2 | 3 | Next → | Last | Single Page

  • Jonathan S. Abrams

    Regarding The Coming of Color:

    The difference between the Black and White 30 fps and Color 29.97 fps (ignoring non-drop and drop frame for the moment) was deemed necessary to maintain compatibility with black and white televisions when color broadcasts were transmitted.

    The argument when color was developed was that the frequency of the color
    subcarrier would create beating with the sound subcarrier that would be visible on some black and white television sets. The sound carrier, however, is frequency modulated. Therefore, beating would have only occurred at a specific frequency. A GE engineer determined that if the frame rate was dropped by .1% (from 30 to 29.97), that the beating would be reduced, and compatibility would be maintained.

    As a result of this change, 60Hz AC cannot leak into a video signal, or bars appear to roll through the picture every 17 seconds. Technicians I have worked with over the years describe this phenomenon as a video ground hum.

    The equation that has driven audio and video engineers mad by creating this
    non-whole number for video sync is: [(number of scanning lines per frame•frames per second)/2]•455=color subcarrier frequency.

    When the appropriate numbers are inserted, it becomes: [(525•29.97)/2]•455 –> (15,734.25/2)•455 –> 7,867.125•455 –> 3,579,542

    The NTSC adopted this equation, and could not change the lines per frame (or all TV sets would be obsolete), so they changed the frame rate. The idea behind the number 455 is frequency interleaving of the video and color signals, which would minimize interference between brightness (luminance) and color (chrominance) data. The number 455 produces a result that is an even number of half the line rate.

    Maintaining compatibility with some black and white sets when audio at a specific frequency was transmitted has created synchronization headaches ever since color video was introduced.

    Regarding Sample Rates for CDs:

    The sampling rate of 44.1kHz was chosen for CDs because the number of used lines in an NTSC picture frame will divide evenly into 44,100. The total line count in NTSC is 525, and 35 of them are blank. That leaves 490 lines for the picture. 44,100/30 yields 1470 samples per frame. With 490 lines per frame, the samples per line is 1470/490, or 3.

    Regarding Timecode:

    In NTSC black and white timecode (30fps), the total number of frames per hour is 108,000 (30fps•60sec•60min). When the frame rate is reduced to 29.97 for NTSC color, there are .03 fewer fps. This causes the time being displayed on a timecode reader to be slightly slower than realtime. The math is (30-29.97)•60sec•60min=108 frames.

    To make the 29.97 fps timecode match elapsed time, two (2) frames are dropped at every minute that does not contain a zero (00,10,20,30,40,50). The remaining number of minutes (54) are each missing two frames from the count, and 54•2=108, which compensates for the difference. Many readers and generators indicate drop frame timecode by using semicolons instead of colons to separate the hours, minutes, seconds, and frames numbers.

    Most of this information is part of a larger paper I wrote, which is available at https://files.nyu.edu/jsa226/public/timecode.pdf.

  • mp4guy

    Great post of audio video sync problems..
    If you wanna fix it after encocoding
    Here are some ways to fix it

    http://newbrotricks.blogspot.in/2014/02/blog-post_3092.html or

    http://lifehacker.com/5910943/fix-out+of+sync-audio-in-vlc-with-a-keyboard-shortcut

  • Ryan Petrus

    For sync up multiple camera angles, I’d suggest trying out PluralEyes (http://pluraleyes.com).

    And if you’re just shooting on 1 camera, check out DreamSync (http://dreamsyncapp.com). It’s not as cumbersome or expensive as PluralEyes and gets the job done for smaller quick projects.

    There’s an app called DreamSync, a standalone application that’s built for the novice user as well as professionals. It syncs your footage and audio into one single clip so that it can then be imported into applications like iMovie, Windows Movie Maker, Adobe Premiere, Final Cut X, or any other editing suite.

    http://dreamsyncapp.com

    Both apps are effective depending on your editing workflow and how much (or little) time you want to dedicate to learning another interface for syncing audio/video footage.

  • Scritti Politti

    Interlacing was not done to reduce flicker. It was because they didn’t think a raster could draw an entire frame fast enough. That turned out to be wrong, but here we are with our new digital “advanced” TV system still dealing with this pathetic hack.

    Not to mention the bullshit non-integer frame rates.

  • keyboardes

    Avdshare
    Video Converter will take
    change MP4 file frame rate as an example and it can also serve to change AVCHD,
    MTS, M2TS, MXF, XAVC, ProRes, MPG, AVI, FLV, MOV, WMV, MKV and almost all video
    format frame rates