Unmix Song - 1.) Differences between SL11 and 12, and also, FFT no difference? + Feature Request

Hi,

Was working on getting some elements from a song unmixed to do some touchups for a re-remaster project.

I mostly wanted to extract the bass, but wanted to find the absolute best method, so I set out to do a multi-varied analysis/test, cross analyzing the results of different combinations of parameters.

For me, FFT and Resolution, and Window don’t really seem to make much of an audible difference in output result, at least for UnMix song - do others share my conclusion?

I started the test by first Unmixing the Drums, Guitar, Bass (and other), and then further down the testing phase, I tried just extracting the Bass. There was maybe some slight difference there.

But between FFT 128, 3072, and 32768 - I’m not really hearing any difference.

Where I am hearing a difference, is between the different quality modes in the process window - Fast, Balanced, Extreme(SL11)/High(SL12)

And then further, differences between SL11 and SL12 quality modes.

But again, within the different modes, no differences with different FFTS whether FFT128 fast and FFT32768 fast, or 128 extreme/high.

I will say, SL11 and SL12 are very different and with this particular project (maybe because of distorted electric guitars), SL12 created a much more audible artifact of this upfront clicky noise. The noise would be a lot less if used Fast mode, but it’s even less in SL11.

That being said, it seems SL12 brings out a lot more high frequency in the extraction compared to SL11, so maybe it is a result of that? But for this sort of particular project, I preferred the SL11 result of which the noise was a lot further away and not as clicky.

SL12 in fact did isolate the bass way better than SL11, so it may just be a the case of trying to remove this clicky artifact with another post process..

Feature Request:

Include previous version legacy process modes, let us select SL11 mode in Unmix Song in a drop down menu.

Hopefully in the future, the AI is able to do artifact self cleanup.

1 Like

Unmixing to stems is not a spectral process and does not use FFT.
However, subsequent spectral editing of the unmixed stems uses FFT.

2 Likes

Is there a list of what processes are spectral and which are not?

Thanks

1 Like

Page 17 in the user manual.

1 Like

and my word does FFT size make a world of difference to transform clarity in the spectrogram. Sincerely, thanks so much for schooling me on this a couple months back, I was clueless. Results of my manual editing has improved massively.

It would be great if each window had some sort of unlit/lit icon that lets the user know if it is FFT based or not, and more, if each module gave their own recommended settings that the user could open a menu for, select, and the master display options would change according.

I agree, it has been suggested a couple of times that 2 different colors of the tools would be of great help to tell wether they are FFT size dependent or not; could be blue and yellow or whatever. And a similar signal for the modules.

As for ”unmixing” vs. ”stem separation”, the stem separation module is called ”Unmix song” so I guess ”unmixing” would be a correct term.

Broadly speaking, tools which directly edit the graphic display of the spectral waveform are spectral tools which rely on FFT.
The graphic display represents audio data which has been transformed into graphic data via Fast Fourier Transform (FFT). This is why changing the various FFT settings changes the graphic display.
The method is “tranform to graphic data….>edit the graphic data….>transform back to audio data when finished”. That’s spectral editing.

”Recommended settings” for each tool dependent on FFT wouldnt really work, as the optimal FFT settings often depend on the individual task, not the tool itself. In other words, the settings depend on the audio you are attempting to improve and the method you are using, not just the tool you’ve chosen.
Longer selections of audio will likely need different settings than short selections. Low frequencies typically need different settings to high frequencies. Broadband selection will likely need different settings to narrowband selection.
The best way to develop a solid instinct for tweaking FFT settings is (firstly) to learn as much as you can about how FFT works, particularly the various methods used to adjust accuracy. Secondly, lots of trial and error - it can be a simple editing method but it’s also very deep and there’s no substitute for spending time at the coalface, trying your own adjustments. A bit of hard earned knowledge can easily be the difference between success and failure when spectral editing.
As Todd says in a post above, your results will improve hugely if you put in the work.

1 Like

Presets aren’t meant to be exact, they are meant to be averages. If you’re using FFT processes for speech, it should be easy enough to have a few ballpark averages.