Hi all, I sound design and mix a monthly 3-minute animated series for kids and I am hoping that someone can point out what I am missing when it comes to getting frame-accurate sync for my Audio Mixdown exports. Here’s what my rig is and what I’m doing:
My system:
I’m running Nuendo 12.0.50 on a Windows 10 PC (custom-built Intel i9 9980XE / ASUS mobo)
Project Resolution: 96kHz / 32-bit Float (almost always)
Video Card: Nvidia RTX 2070 Super (using ‘best quality’ setting profile)
Audio Interface: RME UCX (using ASIO drivers, of course - 6 analog outputs for 5.1)
Monitor Controller: 7.1 StudioComm Model 78/79 (analog)
Monitor / Room Calibration: miniDSP DDRC-88A (analog I/O but there is a 48kHz ADC/DAC conversion internally to apply the room calibration profile; any latency from that should only affect my own local monitoring slightly)
Picture Monitor: I use an inexpensive TCL 1080p HDTV as a picture monitor, fed directly via HDMI from my video card
My process:
1) For the series, I convert every 1920 x 1080 video file I receive into a ProRes 422 (Proxy) workprint in an .MOV container using XMedia Recode, keeping the same frame rate as the original video (in my case, these are always 24 fps). The videos typically do not have any existing audio to start (these are animated episodes for a kids show), so there are no OMF/AAFs to deal with.
2) I place each video into my template at a start point of 01:00:00:00. My video overlay timecode always aligns with the burned-in TC from the video.
3) Using the Catch-N-Sync app from my iPhone, I have already compensated for my inexpensive TV’s display latency with Catch-N-Sync’s test videos (as well as Sync One 2’s test videos) of matching codec and frame rates (ProRes / 24 fps). Using that app, I have found that my inexpensive TCL HDTV has a ~47ms latency offset, which is already plugged into Nuendo’s video settings. Now, if I understand it correctly, this offset does NOT affect the still, frame-by-frame scrubbing image; ONLY the real-time in-session video playback. While cutting / editing, I always place time-sensitive sound effects on the timeline by scrubbing to their exact start positions (footsteps, bodyfalls, etc.), so this 47ms real-time playback offset shouldn’t affect that process. When I play the video back, it always looks perfectly in-sync.
4) When I export my audio mixdowns, I use no pullup or pulldown settings for these straight-24 fps videos. I always export them by selecting the workprint video on the video track, set my locators, and export the mixes and stems via locators (as opposed to cycle markers).
5) When I re-import my exported stereo WAV mixes to check them, they align perfectly with the audio in my project.
6) When I receive feedback / revision requests, I always re-convert the updated video from the animation studio into another ProRes workprint and run a Video Cut Detection on it to catch any altered shots. Often times I’ll check the position of the re-imported audio mix that auto-extracts from it against my project audio. Typically, that extracted audio mix is about a half-frame ‘later’ than what my project’s mix is (roughly 21.5ms). I don’t mess with it, as this seems to be true for any video render I receive back in general, once it’s married to picture. I just make my revisions and re-export new audio mixdowns using the same project settings / positioning as before (always 48kHz / 24-bit WAVs).
So, here’s the problem: the client that receives and reviews the videos says that my audio mix is typically early by 1 to (sometimes) 2 frames. EARLY (!!!).
They mainly notice the louder transient moments, but I’m betting the overall mix feels a bit off for them. I am not sure how they are judging this, but my guess is they simply play the file back in a video player app like VLC, etc.
The video files they review are typically delivered in H264 / .mp4 with the audio having been converted to AAC from the animation studio’s final export stage (normal stuff).
I cannot understand why my audio mixdowns would ever appear early when, as I mentioned above, even the re-imported, auto-extracted audio from an updated video render from the animation studio has the whole mix running about 21.5ms ‘later’ than my project’s audio mix (???).
As a test, I played one of the client review videos back on my MacBook Pro laptop using the VLC player and captured it with Catch-N-Sync to check the transient sounds they are claiming are ‘early’ (careful to change the distance from the 5 feet of my studio listening position to the 1 foot distance I was recording from the laptop).
All of the transient sounds that were mentioned in the feedback were literally starting about 15ms earlier than I have them placed in the project (not a full frame, as suggested, but apparently enough for them to feel like they’re playing too early).
What are my options here? Do I need to change the start location of the ProRes video files on the video tracks in my projects to compensate for this? Is there another offset setting within Nuendo that I can use that affects the audio mixdown exports I am unaware of?
Thanks for any help in advance,
- Rodney