Full score Orchestral score transcription from raw Audio Files

wing · November 21, 2024, 4:22am

String instruments are incredibly nuanced. Sample libraries and modelling instruments still to this day haven’t even quite come nearly as close as what is possible in terms of tone color and complexity. The way a string player bows and selects contact point and bow angle, what string they play on, even the brand of strings, the instrument, the bow hair, the humidity in the air, the acoustics of the room… There are literally endless and exponential complexities.

And none of that accounts for the fact that string instruments, being particularly homogenous in tone color, makes it easy for them to blend as a section. Throw in the added complexity of dividing players at different registers (such as half the celli playing higher than the violas or doubling a melody at unison with Violins), and hopefully you can appreciate how it’s much, much easier said than done, and like Janus said above - like separating eggs from the omelette.

And then full orchestral music with all that tone color and complexity of doubling at unison where entirely new sounding instruments are created by the composite effect of multiple instruments together… asking a computer to do that for anything but very straightforward and strict 4-part writing I think would not pass the turing test.

It works more easily with pop music to separate stems because quite often those elements are much simpler, less layered, and more tonally distinct from one another (separating the human voice from a bass synth from drums, for example, is likely a lot simpler algorithimically than separating a violin from a viola from a cello etc).

jaskarbong · November 21, 2024, 1:32pm

@kayquarii, While I agree that it would be great if audio files could be converted into Dorico (or another format), I have to ask: how are you creating the “audio file of [your] full score”? In my limited vision, I would assume (and you know how dangerous that is!) that it’s some kind of DAW. Couldn’t you export as a Midi file and import that directly? (I’m not trying to be a smart @$$ — just can’t conceive why there’s a problem. )

Cheers!

Maarten_Kruijswijk · November 21, 2024, 1:56pm

Steinberg Spectra layers has the unmixing feature (audio), it is impressing. But that it will be added to Dorico I don’t think that is the near future. I hope it will not be the near future because transcribing music is best educator there is I think. And most off all gives so much joy.

one day a singer will put an very unreadable part on my stand an will say it is correct because the software made it.

FredGUnn · November 21, 2024, 4:40pm

I haven’t found it super useful for separating jazz mixes, but maybe I just need more experience with it. It is pretty amazing at removing drums though! I’m in the middle of transcribing the complete tentet arrangement of Jimmy Heath’s composition Nails, and removing the drums definitely helps me hear the voicings more clearly.

Original:

Drums removed with Spectralayers:

Stephen_Taylor · November 21, 2024, 5:21pm

Apple’s Logic Pro has a similar feature called Stem Splitter, which in my experience works amazingly well.

judddanby · November 21, 2024, 5:26pm

This side strikes me as non-trivial.

FredGUnn · November 21, 2024, 8:40pm

Just tried that and it does a great job of removing drums as well!

VV1 · November 21, 2024, 9:55pm

No.
Direct and diffuse components can be rather reliably “disaggregated” now:

Tristis · November 21, 2024, 11:54pm

Frankly, it doesn’t sound like you have much experience of these tools (if any).

We can separate the eggs from the omelette. If a human can hear something in a recording, there’s no reason why a machine can’t perceive the same things if it’s had enough training.

There is nothing exceptional about separating the sounds of the instruments in a quartet and the fixed format should make it much easier to separate the parts (you don’t mention stereo) than it is with the variable ensembles found in pop music.

I don’t know enough about machine learning to be able to explain the process but the statistical algorithms used in machine learning would seem to have a quite different nature compared with the relatively simple procedures familiar from traditional programming. Separating the eggs from the omelette with that approach did seem to be impossible (just as it’s hard to imagine a single plug-in chain being able to extract the drums and the voice from every pop song). With machine learning, the processes seem to have much more in common with the way we humans hear and analyse music.

Tristis · November 22, 2024, 12:12am

It’s extraordinary how removing the drums can change one’s perceptions. Transcriptions of bass lines that I would have sworn were absolutely correct can sometimes turn out to be surprisingly inaccurate.

SpectraLayers can go one step further and Unmix drum parts (i.e. one stem per drum/cymbal). That can be a great time saver for transcriptions. I really do think the automatic transcription of drum parts is not far away.

Obviously there’s a limit to the separation that can be done at the moment but clearing things out of the way can make an enormous difference and the usual plug-ins can then be much more effective in highlighting the details one’s trying to hear.

Tristis · November 22, 2024, 12:20am

It seems Logic is also using Demucs behind the scenes. It’s strange that its use doesn’t always seem to be acknowledged (the licence would seem to require this).

Tristis · November 22, 2024, 12:31am

The Melodyne people could have done something but it doesn’t seem to be of much interest to them.

Older versions used to have very basic staff notation (pitches only). I found it useful but they told me they thought it was too basic to include.

Michel_Edward · November 22, 2024, 12:31am

I’m also curious about how one gets from “this is my workflow” to asking for this “feature”.
To me, this sounds more like an “I’d take someone’s audio and pirate it into a score” desire than a “my workflow” issue.

FredGUnn · November 22, 2024, 1:37am

My trick for transcribing bass lines is to play them up an octave. Yes, everything sounds like “Alvin and the Chipmunks,” but it gets the bass out of the mud and avoids some stretch tuning issues. I miss some pull offs and other techniques, but then I go back and confirm at pitch.

Mikehalloran · November 22, 2024, 4:08pm

SpectraLayers is quite good but it does not work miracles.

Is there any AI tool that just turns music to notation without additional processing? None that I have seen.

As to the silly notion that this would be simple if only a developer would get on it. Yeah right…

“Everything is easy for the person not doing it.”

Damian_leGassick · November 22, 2024, 4:19pm

I was wondering about this too - if there’s an orchestral recording to be converted, it was played from a score which very likely still exists.

claude_g_lapalme · November 22, 2024, 5:21pm

Perhaps, but orchestral “take down” of Hollywood scores that no longer exist in paper form is a thing. Of course, it has to be done above board and it is. Conrad Pope told me that it is how he started in the film orchestration business.

Still, I would rather use my ears than spend time correcting what a computer does. But I’m sure some tracks would respond pretty well to this. Also, I have done a lot of take-down exercises for orchestra, and while the results were acceptable, the original full scores were definitely different than my versions. Baroque stuff and chamber music fared better!

judddanby · November 22, 2024, 5:59pm

It might be fun to see what an AI transcription tool could do with something like Stockhausen’s Gruppen…

wing · November 22, 2024, 6:25pm

You are correct, I have some but not a whole lot of experience with these tools. But you might have missed my larger point which was actually to say it’s difficult for even the most trained human ear to separate the homogeneity of a section, as I mentioned. There are numerous examples throughout orchestral literature where half cellos play the melody on their A string, at unison with the Violins. Even to the trained human ear they may not be able audibly separate this, especially when dealing with a very large orchestral string section with tons of divisi going on. In many ways, this is part of the point with some orchestrations, to create a blend which is seamless and homogenized.

There are also complex blendings of orchestration where new sounds are created by the composite of instruments – where an oboe, trumpet, harp, vibraphone, and violin all playing the same part in unison in fact create a composited sonic effect of sounding unlike any one of those instruments. When done right, it creates a complete, cohesive melodic unit that can in fact sound unlike any specific instrument. Depending on the context it can be possible for a trained ear to “separate this omelette,” but not always!

In the examples otherwise provided above, the reason (I will hazard to guess) that drums are easier to separate is they have such different sonic quality of being highly transient and unpitched. The algorithms, just like the human ear, can separate this. When hearing drums with a brass section, with your own ears you can easily tell apart a cymbal and kick drum from a trumpet. However, we have greater difficulty separating a flugelhorn doubled at unison with a trumpet and a trombone. Also is the trumpet Bb or C? etc etc.

In the end I am not entirely sure the point or gain of such a tool – because invariably you’d still have to go in and correct a lot manually – and it’s also not likely a worthy financial investment for any developers to pour into something quite so specific and niche as separating complex orchestrations into notation. What’s the ROI for something like that? Not really sure.

lafin · November 22, 2024, 6:51pm

I tried Melodyne years ago. I was curious to see if I could generate a decent XML file of a recording of a piano improvisation. It was not very helpful.
In the rare cases when I wish to transcribe something (or a part of something) that I improvised and recorded–using Transcribe to slow it down and just listen and write it out is the only sure way that I found.