How to achieve rock-solid picture sync in Nuendo 12?

Hi all, I sound design and mix a monthly 3-minute animated series for kids and I am hoping that someone can point out what I am missing when it comes to getting frame-accurate sync for my Audio Mixdown exports. Here’s what my rig is and what I’m doing:

My system:

I’m running Nuendo 12.0.50 on a Windows 10 PC (custom-built Intel i9 9980XE / ASUS mobo)

Project Resolution: 96kHz / 32-bit Float (almost always)
Video Card: Nvidia RTX 2070 Super (using ‘best quality’ setting profile)
Audio Interface: RME UCX (using ASIO drivers, of course - 6 analog outputs for 5.1)
Monitor Controller: 7.1 StudioComm Model 78/79 (analog)
Monitor / Room Calibration: miniDSP DDRC-88A (analog I/O but there is a 48kHz ADC/DAC conversion internally to apply the room calibration profile; any latency from that should only affect my own local monitoring slightly)
Picture Monitor: I use an inexpensive TCL 1080p HDTV as a picture monitor, fed directly via HDMI from my video card

My process:

1) For the series, I convert every 1920 x 1080 video file I receive into a ProRes 422 (Proxy) workprint in an .MOV container using XMedia Recode, keeping the same frame rate as the original video (in my case, these are always 24 fps). The videos typically do not have any existing audio to start (these are animated episodes for a kids show), so there are no OMF/AAFs to deal with.

2) I place each video into my template at a start point of 01:00:00:00. My video overlay timecode always aligns with the burned-in TC from the video.

3) Using the Catch-N-Sync app from my iPhone, I have already compensated for my inexpensive TV’s display latency with Catch-N-Sync’s test videos (as well as Sync One 2’s test videos) of matching codec and frame rates (ProRes / 24 fps). Using that app, I have found that my inexpensive TCL HDTV has a ~47ms latency offset, which is already plugged into Nuendo’s video settings. Now, if I understand it correctly, this offset does NOT affect the still, frame-by-frame scrubbing image; ONLY the real-time in-session video playback. While cutting / editing, I always place time-sensitive sound effects on the timeline by scrubbing to their exact start positions (footsteps, bodyfalls, etc.), so this 47ms real-time playback offset shouldn’t affect that process. When I play the video back, it always looks perfectly in-sync.

4) When I export my audio mixdowns, I use no pullup or pulldown settings for these straight-24 fps videos. I always export them by selecting the workprint video on the video track, set my locators, and export the mixes and stems via locators (as opposed to cycle markers).

5) When I re-import my exported stereo WAV mixes to check them, they align perfectly with the audio in my project.

6) When I receive feedback / revision requests, I always re-convert the updated video from the animation studio into another ProRes workprint and run a Video Cut Detection on it to catch any altered shots. Often times I’ll check the position of the re-imported audio mix that auto-extracts from it against my project audio. Typically, that extracted audio mix is about a half-frame ‘later’ than what my project’s mix is (roughly 21.5ms). I don’t mess with it, as this seems to be true for any video render I receive back in general, once it’s married to picture. I just make my revisions and re-export new audio mixdowns using the same project settings / positioning as before (always 48kHz / 24-bit WAVs).

So, here’s the problem: the client that receives and reviews the videos says that my audio mix is typically early by 1 to (sometimes) 2 frames. EARLY (!!!).

They mainly notice the louder transient moments, but I’m betting the overall mix feels a bit off for them. I am not sure how they are judging this, but my guess is they simply play the file back in a video player app like VLC, etc.

The video files they review are typically delivered in H264 / .mp4 with the audio having been converted to AAC from the animation studio’s final export stage (normal stuff).

I cannot understand why my audio mixdowns would ever appear early when, as I mentioned above, even the re-imported, auto-extracted audio from an updated video render from the animation studio has the whole mix running about 21.5ms ‘later’ than my project’s audio mix (???).

As a test, I played one of the client review videos back on my MacBook Pro laptop using the VLC player and captured it with Catch-N-Sync to check the transient sounds they are claiming are ‘early’ (careful to change the distance from the 5 feet of my studio listening position to the 1 foot distance I was recording from the laptop).

All of the transient sounds that were mentioned in the feedback were literally starting about 15ms earlier than I have them placed in the project (not a full frame, as suggested, but apparently enough for them to feel like they’re playing too early).

What are my options here? Do I need to change the start location of the ProRes video files on the video tracks in my projects to compensate for this? Is there another offset setting within Nuendo that I can use that affects the audio mixdown exports I am unaware of?

Thanks for any help in advance,

  • Rodney

To be honest, it sounds to me like you’ve done everything you can to make sure everything is in sync. So it would be helpful if your customers would disclose how they test your projects. Maybe their technology is out of sync? (Have other customers ever complained?)

Is the audio out of sync from the beginning? Or only over time? Is the offset always identical? Or variable?

Have you tested whether this shift also occurs when you use Nuendo’s video export function?

Do you do this encoding, or do your clients?
AAC? Have you ever tested with a different codec?

Thanks for responding, MAS. Answers below:

  1. The audio seems to be consistently out of sync; their feedback has been it’s a 1 to 2 frame issue, not from drift.

  2. I do not export any videos as my projects are in 96kHz / 32-bit. Nuendo requires that they be 48kHz to do so.

  3. The animation studio does the encoding. We haven’t tested other codecs that I’m aware of. I will convert the WAV to an AAC / m4a file manually and re-import for comparison.

Weird. You could try and make a test project @ 48k-24b and see how to works for them?
Another option is buying/renting a video output device like the ones Blackmagic and AJA make. They are thunderbolt or usb3 and provide very solid sync (at least on my mac).

A question on 96-32b, is this a demand by the client or did you just choose to do it this way? What are the advantages?

Isn’t h.264 codec always slightly out of picture/audio sync? That’s why we always try to use ProRes or DNxH_ files in our sessions?

Hi rodney_gates,

May be this topic

Could give you some light on the dark.

I have said it a million times …
Every project that is send out for collaboration with whoever must have:

  1. A white flash of 1 frame in the leader (typically 48 frames before start) and ideally a white flash after ending the episode. That enables the audio engineer to put a 2pop in sync with the white flash in video.
  2. BITC.

This has been the industry standard for decades.
How easy can it be?

Fredo

2 Likes

True story:

We started working on a film and received video without BITC and 2pop.
We refused to start working before our requirements (BITC and 2pop) were met, and got into a massive argument with the director about it.
Apparently the video studio & editor were working for a flat fee, and that budget was burned.
Finally they did send us the correct video.

During the production, the director was making notes for the composer. (Who lived in Slovenia)
Part of the notes were clarified with timecode (at 24.10 I need an upbeat Ragtime) and parts were clarified with a description of the situation (when the car turns around the corner, please go “dark and mysterious”).

The composer had send the individual pre-production parts to the director who has reviewed them without putting them under the final movie. (Remember, studio and editor were out of sight until final mastering)
After approval, score was recorded by a live orchestra in Prague.

When we received the score, I couldn’t make sense of it. Neither did the composer.
But hey … “That’s what the director ordered” …

During the first screening of the premix, the director went into a rage, blaming us that we did it all wrong and that nothing was in sync.

Long story short:
The director had been making notes for the composer on his laptop, using a version of the video that came directly from color grading… WIthout BITC.
What happened is that Color Grading worked on the picture cut of the film which did not include some logo’s placed before the opening credits.

So the director was making notes, using the timecode of his QT player.

Thereby all of the composer cues based on timecode were 4 seconds (missing logo’s) out of sync, the parts that were cue-ed by describing the scene were in-sync.

Result: the complete score was unusable.

Thats how important BITC and 2pops are.

Fredo

3 Likes

The main reason for my thread really didn’t have anything to do with ProRes-converted audio imported into Nuendo and being out of sync.

The videos I receive usually do not have any audio in them at all (it’s an animated show coming directly from the animation studio that I completely sound design and mix). They always have BITC (but no 2pop), which I am aligned with perfectly.

The issue is that the audio mix I export is playing back early once I deliver it and the animation studio merges that stereo audio mix with their final, deliverable video. About 15ms early, according to Catchin Sync, when I play the mp4 file on my MacBook Pro in VLC and record it with my phone (it’s audible as well).

At this point I am about to simply add a 15ms head to my locators when exporting so that it will play in sync in the final h264 mp4 file that is delivered, unless there is some other offset parameter I am missing in Nuendo (?).

Mind me saying, doesn’t any sense.

If you export your mix with the audio 2pops included, then there is no way they can place it “incorrectly”. They also have to align the audio 2pop to their white frame in the video.

I suppose you have re-imported or re-recorded your final mix into your project, (to which it should line up) That alone makes that your export is “correct”. Anything that happens after your delivery is not your problem.

When you say they have no 2pop, then just make one.
Put an audio 2pop at a specific place in the video. For example, the last black frame before picture start. That way, they can align it correctly with the video.

That is a complete other issue, which has nothing to do with your final mix.
This is an encoding/decoding issue.
Especially when using a H264.

For correct playback, audio is always delayed within a compressed video file.
So the issue can be in 2 places (or all 2 at the same time)
-The encoding to H264 is done with a crappy encoder
-The Mediaplayer on the playback machines is not correctly compensating for the audio delay.

What happens after H264 encoding is NOT YOUR PROBLEM.
A crappy H264 can never be used as a reference or a master.

If your audio lines up correctly at the master @ the studio, then your job is done correctly.
That’s the only question you have to ask your bosses: “does my audio master line up with the video master”?

Fredo

3 Likes

reading this thread and I was going to chime in more details, however Fredo nails it. You CANNOT determine sync based on an MP4 (h264) decodings playback. So many factors at play. Its interesting that your client is that sensitive to 15ms early audio. Was there alot of ADR in this project because production audio usually has a nice windows of 2-4 frames before its even that noticable to the eye in my experience. My hats off to you because I bet they’ve had a lot of notes on other aspects of your work. Best of luck with it. If you put a 2pop beep and they line it up probably its definitely not your problem. If it were a QC department they would have the project in an EDITOR and frame scrubbing to see the actual waveform transient lining up to for example a clear sync point that way.

1 Like

2 pops end to end, even starting before audio is added sounds like a solid idea.

Some of the info you drop definitely puts me in mind of the notorious “AAC priming samples” (google it if you are not familiar).

Despite your attention to detail with Catch-N-Sync, timing irregularities could easily creep into this workflow. A good 2 pop discipline by everyone involved should eliminate that.

1 Like

I remember @Fredo posted a ‘specification’ that get’s sent to clients for correct deliverables (AAF from premier etc.)… it seems to have disappeared from this thread… or was it posted somewhere else? I need it for a project because my last project it was a mess and I had to roundtrip through PT…
Any chance I can have a peek or is it a trade secret?

No secrets.

Temple - Deliverables_lite.pdf (75.3 KB)

HTH
Fredo

4 Likes

Thanks, very informative!