I guess I’m still a relative newbie to this. I’m editing a YouTube video where I’ve done my own recording and will do my own mixing, etc.
I’ve been using clip gain to try and get my dialogue leveled. But, for a 30 minute long video, it’s taken me about 4-6 hours to go through and try to bring everything up. Admittedly, I’ve gone a bit granular with editing lines, especially when most people aren’t going to care.
So, my question is, when doing a rough clip gain pass at the beginning of my session, how much is just enough before I activate my signal chain? Should I:
Focus on taming the louder phrases/portions only? (that fall outside my range of -14 to -10DBFS), ignoring anything that’s passable or…
Clip gain the entire event by event to sit in the pocket of my range before processing? Or…
Loudness Normalize my clips, and then clip gain anything that’s too loud?
I’m trying to make it more efficient to edit/do a round trip so it doesn’t take me excessive amounts of time to edit a video. Also, is it frowned upon to use an auto-gain plugin for something like this?
As annoyingly cliche as this is; I really do think there’s more than one way to skin a cat. (sorry/not sorry cat-lovers)
Leveling dialog by using clip gain for YouTube should probably not take that long for a 30-minute video, in my opinion. Of course, without hearing what the actual dialog is like who knows, but if it’s literally just level you’re dealing with you should get through that much faster I think.
I have my signal chain set so that I do cleanup and EQ first, then level automation, then dynamic plugins. This means that I can ride levels into the dynamics. It takes some getting used to that and a lot of people absolutely don’t work this way. But the point is that I’ve figured out a way that works for me where I can intuitively foresee (or fore-hear) what dynamics will do to my signal after my level automation. This means that all of my event-leveling can be done with broader strokes than penciling in clip-gain curves, meaning that most of the time I’m adjusting level of several words or sentences at a time. So for me it is a trade-off where instead of drawing clip-gain I ride my physical fader. “same” difference since my dynamics come after in both cases.
Now, when I get more detailed it’s typically down to the syllable, but I still use even level by splitting the syllable and applying a crossfade, selecting the newly created event and using key commands to quickly change level. I’m still on v13, but once I move to a later version I might start drawing instead.
Anyway, I just say this because I think that just because some people prefer using clip gain lines to smooth out levels that’s not the only way to go about it. You can do that, you can do something else, or you can do a bit of everything.
To answer your more specific question: I would probably start by focusing on things that are problematic that your dynamics signal chain can’t address. I’m guessing that’ll be things that are too loud/soft.
I tried one once and couldn’t get it to do the job I do manually either faster or better. So I never bothered incorporating it into my workflow.
Nothing wrong with using an auto-gain plugin! FWIW, here is my method for a series I mix: first I run an automation pass in real time then go back and smooth it out so probably about an hour in this scenario. Then I apply some light compression and maybe an auto-gain plugin after that. I’m in a calibrated room so I can tell when my loudness is over or under but you could just have your loudness meter open and keep an eye on it. Definitely faster than clip gain IMO.
That’s honestly not a bad idea, doing audio cleanup first then applying dynamics processing to the more problematic/loud/overly soft portions. I’ll try that out next time. Cheers!
Among many other great features, I use Melodyne’s Auto Dynamics for quick, reliable volume matching of vocals and dialogue on a word- or even syllable-based level.