I’ve been using stem splitting as a part of music transcription workflow. In Logic Pro, it works almost perfectly with any song I’ve tried. Vocals, bass and drums are really well isolated with only occasional small omissions.
Recently, I’ve purchased SL11 Pro so that I can have the same workflow in Cubase, as the new Score Editor makes it great choice for music transcription. I tested it (Vocals, Drums, Bass, Other) with a song that Logic handles really well and found following issue with SL11:
Violin (dirty gipsy style) leaked completely into the vocal layer. It can be easily separated manually, but that’s not the point
Everything else, except vocals, significantly and audibly leaked into the bass layer. This is also a regression from SL10, which handled bass separation in the same song better with respect to this issue
A few notes got lost from the bass layer. Those are arguably short and tricky notes, but Logic got them correctly
While extracted bass sounds really clear in Logic, it is quite distorted in SL
This one is hard to explain as I lack proper vocabulary, but everything sounds somehow weird and muddier after unmixing, compared to Logic
Either I am missing something obvious or SL performs significantly worse than even algorithms based on spleeter.
That being said, in some rare occasions or with very specific genre or combination of instruments, the Fast mode can work better than the High/Extreme mode, so you might give it a try as well.
Just listen to the bass in “November Rain” and you can clearly hear what I described in point 2. There is significant amount of non-bass instruments in the SL case, but not in the other 4 apps. And I believe that SL10 doesn’t have the same issue and sounds more or less like the other 4.
I still think that there are some fundamental issues with automatic unmixing, especially in SL11.
But to give credit where credit is due, SL is simply amazing for manual editing. For example, if there is an accordion melody line that prevents rhythm guitar to be clearly heard, no problem. Simply use harmonic selection tool and remove the accordion.
In my opinion, the art of demixing is to be looking and experimenting with all available technologies out there and accept some demixing tools work better than others, some work better on some styles of music, some work better using different ordering of demixing. e.g. do you begin separating vocals, drums and bass or instruments/other first, where can you afford spectral holes / imprints, where can’t you, is this muddier sound acceptable in this stem or should it be in another. All tools and approaches have strengths and weaknesses and all are better suited or not to each tool available.
I would agree, the demixing in SL11 isn’t the best overall, but it does have some best in class processes both for demixing/AI or more traditional dsp. The diy and less-commercial source separation community is progressing the technology and models so fast currently, I do fear tools like SpectraLayers, Logic, RX, RipX etc will always remain a few years behind and the general SDR rankings do currently show that trend the last few years, but big industry is slow, that’s expected as it has more red tape to get through to release.
I listened to a bunch of examples in that site and I really don’t feel I could, from that sample, say that one or another are the best or worst or this one is better than that one.
Everything felt really hit or miss all around, with HIGH program dependency and very varied results.
But I don’t think that spill is the end all be all in automatic stem separation, cause the musical and timbral quality of the results are equally or more important.
Manually removing some spill is much easier than recovering a mangled timbre.
I’m not saying that SL is better overall, but it certainly doesn’t sound “fundamentally flawed” to me, from those examples.
It seems to be better in some ways/songs/elements and worse in others.
SL11, RipX, RX, Logic etc are all quite far behind and similar in my testing as an ‘overall’ evaluation of each. The way the industry works in my observations over the years is this:
Universities create the hard work / algorithms and training models and make it open source / free / commercial
3rd parties like Apple, Steinberg, RipX, Izotope etc commercialize it by simply integrating the above.
Open-source / DIY community continue development iteratively from where the University and 3rd Parties stop and essentially leapfrog commercial tools.
This is why you never see commercial separation tools at the top of SDR ranking, because they are a moment in time, not the best possible right now.
Sometimes you can even visually see the difference in quality between commercial and non-commercial. e.g. here’s Logic Pro (top) v ViperX BS Roformer on a vocal separation:
Look at how much cleaner the separation is in ViperX than Logic, it looks much more like the original stem would have looked basically and so has a much higher SDR as less distortion has come through as phase difference etc.