It’s partly a practical thing. I always render all mixes and stems from outputs only, and the only thing I have on outputs are at most a very conservative safety limiter (that hardly does any work). So it’s a final point in the signal chain that I never touch or automate. In Control Room in Nuendo I tap the different mix and stem output buses as monitor sources so I always hear exactly what’s getting rendered. So I can always switch quickly between full mix, mix minus narration, M&E etc. It’s also tidier because it’s easier to color code the type of track (outputs) and show/hide all of them. Just neater.
I route everything into “food groups” that are group tracks. So all dialog and the dialog reverb goes to a group track, sfx and fx reverb gets its own group, etc. and then the groups go to the required outputs (mixes / stems).
Why separate outputs per stem? Because by definition in sound-to-picture the stems have to be separate, it’s usually a few (?) different recurring combinations the networks use, and it’s the only thing that makes sense in terms of time management. When I get source audio from the video editor all I have to do is put dialog on dialog tracks in my template, nat sound on nat, and so on. Since it’s already routed in my template I automatically always get what I need where I need it, in the correct outputs.
When it’s time to deliver final content I just select multiple outputs and export everything at once. I don’t need to solo things and export and then solo something else etc. It’s all pre-routed in the template.
Actually for a fair amount of shows there’s “bleeps” (sine wave) that replaces specific curse words, and in the international version of the same show the curses are left in. So what I do is I have separate dialog tracks called “dia curses” or whatever it is and they go to a dialog group that contains only those words but the same processing as the main clean dialog group. Clean dialog group goes to both the censored and uncensored full mix output. Curse dialog group (curse words only) go to only the uncensored full mix output. The bleeps go to only the censored full mix. (and this applies to other mixes/stems).
This way I can again just edit the dialog once and cut out curse words and move them to a separate dedicated track, paste a bleep in the bleep track (goes to censored full mix only) and then I don’t have to do anything else. It’s all done.
At least once per year I have a client with a ‘dirty’ dialog track that gives me new specs / deliverables last minute mandating both censored and uncensored, and this way it’s as easy as ticking a box in the export dialog and everything is routed correctly. Of course I need to know ahead of time which words are acceptable for the given network.
Actually I have two templates. The ‘current’ one is stereo-up. So it’s a stereo focused template, and I have a few groups that ‘upmix’ to 5.1 if needed. It’s not really the best way to work, it’s mostly legacy routing and a ‘quick and dirty fix’ if someone asks for it. The other reason I have the 5.1 output as the main output is my monitoring system is 5.1, and I prefer to edit dialog on the center channel. So all dialog and narration tracks in my template go to a separate “EDIT” output that is mono, and when I choose that in CR it goes to my center speaker only. I can switch to hear dialog only on the center with just a quick key command, I never need to solo the specific track or dialog vca or whatever.
Then my slightly older 5.1 template is an actual 5.1 template where food groups are 5.1 channels wide instead of stereo, and that then has automatic downmixing to stereo which sounds basically identical in terms of balance. Last year I did a show with original language and Spanish dub and both 5.1 and stereo for each, plus all the usual stems, and that too was exported all at once. That ended up being a wider project though because of the double languages, but doable.
Really the only thing to look out for when mixing this way is making sure level automation between “food groups” is done in the right place. Some require undipped music, others don’t, for example, so that determines where you drop music for dialog and / or narration. But as long as that’s done correctly from the start it’s a quick way to work.
I’m using Nuendo so I’m not sure if or how it translates, but sure.
PS: I’ve been playing poker and drinking Rye whiskey so if there’s nonsense above you now know why…