Better source separation ability with higher sample rate?

Maybe it is obvious but I haven’t noticed before.
Unmixed today a simple song from a hi-res (96/24) record bought from HDtracks.com.
I have unmixed the same song before, ripped from a CD.
Drums, electric bass, two distorted electric guitars, male vocal.
The track was ”Behind the wall of sleep” from Black Sabbath’s first album.

Comparing the results I noticed the stem separation was tremendously better with the hi-res track. SL10 had no problems at all to deliver crystal clear sounding stems, the drums sounded very powerful and with shiny cymbals and no artifacts. Even the electric bass was fully separated without any artifacts, heavy and distinctly sounding. Same for guitar and vocals. Very impressive!
Whereas when trying with the 44.1/16 track the separate layers had lots of unwanted spillovers from one to another and neither the drums nor the bass were as consistant in volume or power as with hi-res.

It’s not that there was a very big difference in how the un-unmixed stereofiles sounded; there wasn’t.

Is this common knowledge that the separation process behaves considerably better under higher sample rate circumstances?
I don’t think the greater bit depth was what made the result better since I usually don’t notice much quality differences between 16 & 24 regarding separation ability. I might be wrong there though.

1 Like

Sorry if this is a stupid question, but I couldn’t tell from the phrasing in your post:

Did your previous source separation happen with exactly the same software and version?

Yes indeed.

1 Like

That makes yours a very interesting observation.

If the CD was an original store bought CD, and not privately created from a compressed file (mp3 or whatever), then the next question might be if the more recent version was re-mixed and/or re-mastered, or if it was really only a higher sample rate and bit depth.

Assuming the original masters were recorded and mastered on tape, there might be a significant difference between the old and the new in how the tape based material eventually made it into the digital domain.

A better mastering chain going into the published digital mix would seem like a good candidate for expecting a better separation result.

One would probably have to know considerable details about the old as well as the new digitization process to fully understand all the differences.

In addition, as we get older, “trust your ears” is becoming an increasingly problematic and downright erroneous phrase.

I’m terribly grateful for all of the analysis tools that give some visual indication of (at least some of) what’s going on in frequencies, that are no longer “in reach” for me.

A 96/24 song will be first resampled to 44.1/32 (or maybe 24) before demixing in SpectraLayers as is the case for almost all demixing models out there.

The datasets the models are trained on however, are usually 44.1/32 or 24bit because they are compiled from stems which typically are kept in their high bitrate state. As bitrate = dynamic range I guess theoretically the inference is helped by using 24 or 32-bit input, because that is what the model is trained with and so that dynamic range within the model better-matches your input.

I can’t say I’ve personally noticed a difference in demixing to the point things demixed much differently, but it is a bit different. I always try and get the master as 24bit unless a 16-bit CD is all they have and masters lost.

I 'd imagine it could depend on the amount of processing that lies between the now 50+ year old master tape and each of the two versions (CD and hi-res audio file).

For example:

  • the age and condition of the tape at the time it was digitized
  • the resolution that was used when it was digitized
  • the quality of the A/D conversion used to digitize it

I’ve heard many CDs which have vinyl clicks, for example, then there are the “remastered” CDs that just have compression and filtering applied, etc. – digital filtering in particular most likely introduced “smearing” phase issues, for example. I’d imagine any unmixing process would struggle with these.

On the other hand, if someone has access to pristine digital transcriptions from an original, first generation master tape, originally digitized at a bitrate higher than 16 and without dithering, then, yes, I’d expect better unmixing results.

It really boils down to the provenance of the input material. I’ll admit I’m sceptical of commercial offerings of hi-res audio, but this may be a way to prove that there really is a difference, at least for some products from some vendors.

I read somewhere yesterday (github?) that while training AI files are downsampled to 44.1 kHz but I didn’t know SL also does that.

If AI was trained with 44.1 material, (and if SL10 wouldn’t downsample) would that necessarily mean the separation algos in SL10 won’t benefit from a higher sample rate?

I looked up my CD and I see now that it is an older remaster (1996 from the original master tape) than the hi-res version (2014), which may perhaps explain the stunning difference. :smiling_face:

1 Like

Load the 44.1 CD rip into SpectraLayers and then the same for the 96/24 (set SpectraLayers to 44.1 in audio preferences so both files are 44.1) then do a null test. If you’re left with little then they probably both came from the same digital source, but my assumption is the CD version is crushed and the HDTracks one isn’t, so even though both will pass through inference at 44.1/32 there is massive dynamic range difference and so things like high transients of the Drums will separate better off the HDTracks.

Yes, I believe the hi-res version from 2014 is simply better mastered than the CD version.
I don’t know where HD Tracks get their hi-res files from but the CD says it’s remastered in 1996 from the master tape.

I just found the original CD and the Remaster downloaded from Tidal and they are quite different so it’s probably that. Strangely the CD gave slightly better drum separation so there might even be a third version for HD Tracks.

Demucs is trained off audio stems, not images, but the inference is an FFT comparison in AI Demixing models, either frequency amplitude only and/or phase. I guess Steinberg might have tweaked the original dataset, but certainly demucs is trained on MUSDB18 | SigSep

I would have been nice if @Robin_Lobel would chime in and give his opinion on the matter.:pray:t2:

It’s likely the difference in mastering was the most important factor resulting in a better demixing. However higher bit-depth can certainly help too. Frequencies above 44.1Khz in the case of the Unmix Song model will indeed go to the Non Unmixed layer.

4 Likes

Thanks for clarifying!

Out of sheer curiosity, I picked a commercial track that I have on original vinyl from 1972, on CD from the first release in that format in 1985, and as a 24-bit/192kHz commercial download, and made comparisons.

I still have both the original vinyl which I had transcribed it at 24-bit/44.1kHz and the CD which I had ripped using WaveLab. With the levels carefully matched, I cannot tell the difference, sonically, between any of these. I would have to conclude that they are all derived from the same source, presumably the same master tape. I do not hear any differences in the mix, and there is nothing to seperate them musically.

There are of course slight differences if compared forensically, such as low level noise and the occasional click on the vinyl transcription, and the files do not null, but the residuals are well below audible.

To my surprise, both 24-bit files fared better than the 16-bit CD rip in “Unmix song”, and the clear winner was the 24-bit/192kHz one! This was particularly noticeable with the drums, where the snare had a noticeably different character, and the crash cymbals were seperated out clearly; the bass (Fender Precision) was also clear and undistorted.

So, given that I’d selected a track which had no perceivable differences in mastering or mixing across the three versions, I’ve come to the conclusion that both increased resolution (bit depth) and possibly increased sampling frequency used to capture the original source make it easier for the AI models to perform effective seperation.

I have been quite surprised by this, and, while I don’t usually do seperations of commercial tracks, it’s certainly worth considering for those who do. For archivists, the mantra of always choosing the best available format for transcription (in the belief that future technology will be able to avail of it) is now proven advice.

3 Likes