Unmix Vocals confused by band intro

I’m new to SpectraLayers 7 Pro, but I’m having some problems with simple unmix of vocals. When I do it on the entire song, the first phrase of the vocals does not get separated properly. However, if I do an unmix of the song without the band intro it seems OK.

Attached is a brief clip with the intro and without the intro to illustrate the issue. When I do Layer > Unmix Stems… with Vocal on one layer and all else on a separate layer, the clip without the intro works fine. But the clip with the intro seems to miss a lot of the first phrase.

Is it getting confused by the saxophones or other instruments in the intro?

I tried different sensitivities (0.0, +0.3, -0.3) and I get the same result. Any ideas how to fix this without having to cut the entire song into pieces?

I’m on a Mac, Catalina OSX 10.15.7.
Archive.zip (653 KB)

I always tell myself not to expect miracles from stem-separation technology. With that in mind, I occasionally use it for mix rebalancing, and I mean occasionally. I don’t separate the stems.

The first question is, do you want to isolate the vocal or isolate the band?

It sound very likely that the original recording was mono, and somewhere along the way somebody “stereoized” it, i.e. use some kind of pseudo-stereo processor, added reverb and some other “remastering” magic tricks (which is often done so the labels can resell a recording). The problem is, that confuses the hell out of any model trained to recognise the human voice! Added to that is the presence of instruments in the vocal frequency range (saxophones) which play underneath the vocal in a fashion typical of the big band stlye.

If you want to isolate the vocal, my first instinct would be to mix down to mono, or perhaps use just one channel (pick the one with the least reverb) etc. or play with an M/S decoder, channel phase etc. to try to minimize whatever “restoration” was done to the original recording. Render that mono mix to a new file and then try unmix on that file again.

The goal was to remove the vocals sufficiently for band practice purposes. So some residuals are OK.

Given that the singer is Sarah Vaughan, I would guess the original is mono. That’s a good observation about the “sterioizer” confusing the vocal recognition AI. I mixed down to mono as you suggested and worked with that. It pulled out the vocals just fine. There were some of the muted horns in the introduction that got cut too. But that was easy to select and do a “Cut to Layer Below”.

Thanks for your help.

Good to hear! If you’ve isolated the vocal sufficiently from the mono, it might be possible to use the old trick of flipping the phase and mixing it back with the original stereo, playing with gain, panning and phase to minimize it.

Good idea. I’ll try that. Thanks.