Full Audio Mix to MIDI?

Hey. Is it possible to take a single audio event (with keyboard, bass, sax, drums, percussion, what have you… all in the same, single audio event) and extract that single audio event to various MIDI tracks (one MIDI track for each instrument)?

It seems like the Audio to MIDI feature is only for monophonic, single track, individual stems?

Thank you in advance.

Right now, no, but we’re close; to quote Arthur C. Clarke, “Any sufficiently advanced technology is indistinguishable from magic”, and what you’re asking for (in April 2024) still lies in the realm of magic, however the technology is advancing at a rapid pace.

In Cubase currently, you can use SpectraLayers to seperate out the mixed audio into individual audio tracks, and from there you can use VariAudio to generate MIDI tracks, however neither process is perfect and depend on the nature and quality of the source material.


Thank you. I saw in this thread, https://forums.steinberg.net/t/convert-audio-finished-song-to-midi-tracks user @Johnny_Moneto mentioned RipX RipX DAW - The first of its kind AI DAW (hitnmix.com) Have you tried this program?

RipX has a free demo. Try it. There are also many other free alternatives available, and you should look at some of the demos on YouTube (search “stem seperation”).

At the end of the day, it really depends on what your source material is – some material can be very cleanly seperated, almost as good as having the original multitracks from the recording studio, while other material can defy all efforts. Some applications may seperate bass tracks well, others may be better for drums, etc.

AI sound seperation technology relies on the how well the “models” have been “trained”; if the source material fed into the system did not focus on the machine learning what a saxophone sounds like, then the resulting model will not be able to seperate out a saxophone track. Different applications use different models; that’s where the competition is right now.

Almost everything that’s currently available is focussed on vocals, then drums (specifically bass drums and snares), then bass. Anything beyond that may or may not work, for example, if a sax line lies in the frequency range of a vocal and exhibits certain features, it may potentially be successfully seperated out using a model trained on solo vocal singing recordings; Coleman Hawkins might work, whereas Wayne Shorter might not.

So what’s the best? It’s the answer you don’t want to hear: it depends.

So try the demos.

1 Like