Stereo to Surround - how do I do this please?

Joey_Kapish · November 17, 2024, 7:29pm

I didn’t initially want to respond because I didn’t want to derail the post/topic but because I got so many replies I want clarify. It seems like dialog in films today sits more than just the center (as-oppossed-to dead in the center). So to make it clear, yes, dialog is generally in the center of the field but field of dialog (itself) is more than just dead in the center.

I have noticed that the volume of dialog in films has drastically been raised (especially over the past 20 years or so). The dynamics of the audio on films on DVD’s is way different than the audio from streaming today… For example audio from a scary film from a DVD 20 years ago has much more variation in dynamics than audio from streaming. A further example explaining this is the dialog volume of a scary film (from a DVD audio) from 20 years ago volume is a lot lower and the sfx(bangs and thuds) is a lot louder and much more dynamically diverse, whereas today (on streaming platforms) the volume for audio dialog has a lot more gain and generally a lot louder. The dynamics between sfx and dialog (on streaming platform audio) is still there but the diversity in the dynamics (compared to the audio from a 2004 DVD) is definitely not the same. The audio from 2004 definitely had a lot more diversity in the dynamics when it came to audio.

So to sum up any misconceptions or any misunderstandings, the audio for dialog in films produced may be based around the center field but it’s definitely not directly dead in the center… I can’t quite remember which film it was (that I noticed this) but I was observing a particular film and there was a scene (in that film) where the actual music (itself) was panned dead in the center (meaning that they purposefully panned all the left and right stereo information directly to mono) assuming for the purposes to give the dialog scene more weight. Just to be clear, I’m not referring to the actual “score”, I’m specifically talking about billboard music where in the scene itself there was high action fighting and in-between that fighting uptempo music started playing (to fuel the audiences adrenaline), and at the same time the dialog of the characters was also active, however the dialog was on the same level as the music itself and either had the same stereo field (as the music itself) or the dialog had a much wider stereo field than the actual music itself. The audio of dialog in films today is not mono and it seems to take up much more than just the center field.

Yes! It seems like there is compression (maybe stereo enhancement) on the dialog to fatten up the stereo field on the dialog, but it’s definitely not directly dead in the center.

MattiasNYC · November 17, 2024, 9:04pm

But what are some good examples of that?

I literally can’t remember any recent movie at all where dialog wasn’t dead-center pretty much all the time except for specific effects or a touch of reverb. It is really easy to check - just pull up a movie in 5.1 and mute the center. I just checked:

Alien: Covenant
Inception
John Wick
Tenet
Furiosa
Dune: Part Two

Dune 2 is a good example of how dry dialog is dead-center and if they’re sitting on a sand dune talking there is nothing else in the other four channels. If there is a scene where they are in a large cavern on the other hand there’s some reverb bouncing around.

Inception on the other hand has plenty of scenes where it would be justified to have reverb (i.e. in a garage or warehouse) but it’s still just natural location dialog only in the center.

None of those that I checked had dry/direct dialog anywhere but the center.

Robert_Niessner · November 18, 2024, 11:07am

Joey_Kapish:

I have noticed that the volume of dialog in films has drastically been raised (especially over the past 20 years or so). The dynamics of the audio on films on DVD’s is way different than the audio from streaming today… For example audio from a scary film from a DVD 20 years ago has much more variation in dynamics than audio from streaming. A further example explaining this is the dialog volume of a scary film (from a DVD audio) from 20 years ago volume is a lot lower and the sfx(bangs and thuds) is a lot louder and much more dynamically diverse, whereas today (on streaming platforms) the volume for audio dialog has a lot more gain and generally a lot louder. The dynamics between sfx and dialog (on streaming platform audio) is still there but the diversity in the dynamics (compared to the audio from a 2004 DVD) is definitely not the same. The audio from 2004 definitely had a lot more diversity in the dynamics when it came to audio.

It would make sense to raise dialog levels for streaming to better fit the way people tend to watch nowadays. Most people watch movies on their phones, tablets and Laptops. Also modern flat TVs do have very bad sound due to their thinness and they also do a lot of audio processing per default. Most devices do dynamic range compression too.

Personally I know many young people who even have no TV set at all.

But it would help for this discussion if you could name a few movies you have found to have a spread out dialog, so we can analyse those tracks.

You should also tell us if your observation stems from a movie theater or from your home theater and if the latter, how was that setup.

Sunnyman · November 18, 2024, 3:18pm

This is an interesting thread to me.
While reading this, the following questions arose:

Does the unmixing of the spectral components of speech or an instrument preserve the phase information exactly? Or will there be slight errors? And how would those errors accumulate over more and more unmixing separations?

Here the idea: Imagine there are five singers, each with a very distince tone (so to make it “easy” for the algorithm to localize the spectral components), but with different positions around the 180° semicircle in front.
Two for instance on the far left and right, than the other three apart. So at 0°, 45°, 90°, 135° and 180° are the voices.
Will the unmixing algorithm preserve those angles?

When I have time, I’ll check this, just out of curiosity.

Robin_Lobel · November 18, 2024, 5:35pm

There should be no loss of phase information : each pixel you see in the spectrogram has phase information associated to it, and it’s kept while separating those pixels into different layers (either with unmixing modules or when extracting using manual selection tools).

ctreitzell · November 20, 2024, 12:18am

This happens in all video related media and e-learning vids

this is a facet of mixing with the the eyes, IMO

Putting voice recordings and normalized SFX and Music mixed with the eyes alone is what I believe creates 90% of these volume mis-matches…it is literally bad mixing.

I was discussing this with a colleague just the other day…“what volume to set the music?”

well, what volume is everything else? It’s all relative and different mixes are at different volumes…some sounds are more distracting that others as a backing for talking

neilwilkes · November 25, 2024, 2:04pm

I would argue that the opposite is the reality given the massive number of times I have literally struggled to even hear dialogue due to the score hammering away on just about every scene in so-called ‘big productions’ trying to tell me how I should react to each & every scene (and oh, what joy I feel when I run across a production without the non-stop blaring of the same sample libraries - or what sounds like the same sample libraries, anyway).
DVD’s of 20 years ago were (at least to my ears & way of thinking) far superior to the modern ‘slammed to death’ productions where the whole thing has the dynamic range of a lightbulb. It also goes to prove that this fad of making everything suitable for people watching on tablets or telephones when what really should be done is to create the mix for the intended release media - you know, what we used to understand as ‘mastering’ in the days before it came to mean ‘someone’s last chance to utterly wreck what the mix engineer did and make it bangin’ loud instead. But I digress.

The centre channel belongs to the dialogue. Period. It simply must otherwise how are you supposed to dub for different languages?
Yes, you can create an effect or illusion of it being spread wider by routing to a subgroup & adding some effects such as a widener (but for the love of all please be sparing with it)

Exactly. You make my case above for me with this observation - thank you!

Yes, and as explained above it’s called ‘effects’ and again as explained above the dialogue must be dead centre otherwise you’re just not dubbing it for global release.

MattiasNYC · November 25, 2024, 2:33pm

I think in the vast majority of cases the dubs are based on dedicated M&Es (plus optionals), not a full mix minus center. It’s really a separate issue.

But in principle I agree that dialog belongs in the center 99% of the time and everything else is basically an “effect” that has to be creatively justified.

Robert_Niessner · November 26, 2024, 2:06pm

6 years ago I had a heated discussion with the audio mixing artist (won’t call him “engineer”) of the feature film from a friend of mine. While the mixer called himself a professional we had all sorts of issues with his work.

Dialog levels were all over the place - from hard to hear up to unbearable load.
He was insisting that there are no loudness rules for cinema…

So we were watching the movie the first time in the cinema for a test screening and I used a loudness meter to check the sound. At one point I got a peak measurement of 123dB and my ears began to hurt for a while. It was almost impossible to sit through the rest of the test screening. The DCP showed -12 LUFs (while [EDIT: -18 LUFS <= incorrect] -21 LUFS would be the norm for action movies).

And still that guy insisted there is nothing wrong…

From that day finally my director friend let go of that sound studio and I salvaged the audio as much as possible. We only had the 5.1 mix and some older stems - so I had to doctor around the final mix. That guy had pushed a lot of sound into the center too.

When we got a deal with a world wide distributor, initially we were not able to offer M&E stems for dubbing. So I took a few weeks and edited all dialog out of everything in the 5.1 mix by using iZotope RX. Really hated that guy afterwards.
Spectralayers would have made my life so much easier back then.

neilwilkes · November 26, 2024, 2:46pm

Having done my share of rescue jobs I feel your pain.

MattiasNYC · November 26, 2024, 2:52pm

That would equal roughly 100dB average level across the program. Seems awfully loud even for an action movie. Is that really “the norm”?

ctreitzell · November 26, 2024, 2:56pm

Yikes! I would be gone at 100 dB…fingers in ears

this fellow is clearly mistaken, devices have tolerances as we all know…including human ears

ctreitzell · November 26, 2024, 3:01pm

I mix just under -18 LUFS peak for the online e-learning stuff we do and previous podcast work I was doing…streaming levels is what I thought -18 LUFS is meant for as a target…in a cinema theater I’d expect -18 LUFS to clear the room!

I’m mixing my documentary to -22LUFS and I think that is still far too high…I’ve read about many mixing for cinema to -24 LUFS peak

MattiasNYC · November 26, 2024, 3:23pm

“LUFS” isn’t a measurement of peak loudness though, it’s an average. That’s why I said that -18LUFS seems very, very loud as an average even for an action movie (i.e. probably not the norm).

To me that doesn’t really make any sense.

ctreitzell · November 26, 2024, 3:29pm

indeed, I understand…nevertheless my metering is set to “Peak” and LUFS…granted I use FabFilter ProL for my loudness metering and never touch the limiter threshold

If you don’t mind my asking, what makes sense to you? I have no interest in being argumentative, just looking to improve my rudimentary understanding

MattiasNYC · November 26, 2024, 3:39pm

For cinema it makes sense to mix on a calibrated mix stage with no “standard” for the mix, just for calibration. The ceiling would be -2 or -1dBFS True Peak, whatever it is in the Dolby spec. Loudness of the mix is whatever sounds good. LUFS is what it is, no need to shoot for a specific value for theatrical releases.

When I said it didn’t make sense I meant that actual peak values would be either just a pure numerical sample peak, or the calculated “True Peak” value, and it would not be “LUFS”. The closest you get to a short term value using the loudness algorithms (BS.1770-1/2/3/4, “LUFS”) is “Momentary” which is similar to VU if memory serves me right. So, not peak either.

When people said they mixed for cinema to -24 LUFS they probably meant just average loudness, not peak. In addition to that I bet a lot of those people were mixing shorts, not feature films. For reference Netflix spec is -27LUFS on dialog / dialog gated, if memory serves me correctly.

Robert_Niessner · November 26, 2024, 3:42pm

Don’t forget that cinemas can and do set individual levels for each movie in each auditorium.

Robert_Niessner · November 26, 2024, 3:57pm

Sorry, yes you are right. I mixed that one up in my mind (got distracted by several calls during writing the post).
-18 LUFS is the standard for trailers because they get played at slightly lower levels. The standard for action movies is -21 LUFS. For rom-coms it would be around -24 LUFS

I’ve edited the initial text to reflect the correction.

ctreitzell · November 26, 2024, 4:20pm

thanks @MattiasNYC MattiasNYC and @Robert_Niessner for these numbers

if one could afford it, yes please! sadly, that is not currently possible on a small box, self-funded project

In the film course being taught at the university my company was working with, audio was not really taught,just a suggested -6dB peak for cinema (yikes!)…I’ve always mixed to -10dB for TV broadcast and later streaming…I do not want to feel a tablet vibrating out of my hands or TV monitor speakers buzzing.

nor do I find calibrating my home “studio” workstation to 88dB an option; which blows my head off…that will always be too quiet IME even 78dB is too loud IME

complicated stuff, frankly

Sam_Hocking · November 26, 2024, 4:32pm

I’ll chime in on what I know in context to the OP as I do quite a lot of work for Atmos engineers.
Tools like Penteo typically are what’s called ‘Upmixers’ They take lower channel count ‘mixed’ audio and attempt to upmix the direct signal to the direct speakers and the diffuse to the surround/height speakers. Various ways that’s done, be it mid/side expansion, fft filtering, intensity distribution, beamforming etc.

Tools like SL11’s ‘Unmix’ isn’t an Upmixer, it’s a Source Separator, so this could leave you with e.g. vocal, bass, drums, other stems. You can then place those stems into a multichannel audio track in SpectraLayers. Flat formats like Atmos and Surround are mono interleaved and can be played as channels to matching speaker positions.

Using tools like the Dolby Atmos Renderer either built into DAWs or Standalone from Dolby, allow the multichannel audio track to be used within an Atmos/Surround DAW project and/or you can take unmixed stems from SL11 and use them as objects / bed channels in the 3D/Surround room.

When e.g. Dolby Atmos is rendered to a .wav it’s an ADM BWF format. Essentially every Object and Bed channel in the DAW is rendered to the ADM so that it can be used to generate file or streamed Atmos content.
As an Atmos ADM BWF is simply a fancy .wav file with metadata, SL11 will probably be able to at least get at the audio pcm tracks within it, but without the metadata it’s limited. Typically you would re-render the Atmos ADM BWF to a plain multichannel .wav first and then that would be played direct to speakers or used on multichannel tracks in your multichannel DAW just like any other mono or stereo audio track can or imported as is to SL11 for editing,