Music scanning (recognition), AI and subscription models

I hope this doesn’t come across as too much a stream of consciousness. I have used music scanning software for well over a decade. There are many perfectly legal reasons for wanting to do this. Sometimes I want to base a new project, in part, on some public domain music. Often, I have purchased arrangements that either contain mistakes, or I may need a different transposition for a substitute instrument. Sometimes I need to produce a decent rendering of the music. Sometimes I need to alter an arrangement to a different key or perhaps a different length to match the needs of a service. There are many legitimate reasons for wanting to turn printed music into MusicXML.

Sadly, the state of the art is very poor. There are few products out there. None ever work completely accurately on anything but the simplest of pieces. None of the products show much commitment to development and support. This becomes a vicious cycle. If the tools don’t work very well, then there is not much market, and therefore no money to justify further development.

It seems to me that in a world that is swimming with AI, music recognition may be the perfect case for applying AI. It should be noted that what passes for “AI” today in many cases is simply a developer invoking an existing large language model (LLM). The developer isn’t doing any “AI development” per se, just using a “black box” In other cases, specialized neural nets are developed for specific applications. We see that In SpectraLayers and now Cubase 15 with the stem separation features. Stem separation is completely unrelated to LLM, and required people to create new nets from scratch. I doubt that Steinberg has hands on the development of these stem separation AIs, but certainly Steinberg is becoming familiar with the process.

To develop a new neural net, one needs training data, where there is a source and a known outcome. We have that by the millions in music notation. We have a practically infinite supply of printed music, and much of that has a digital equivalent that could be used to train the nets. I am not saying that is easy. But I am saying that if anybody ever gets this right, it will be transformational in how we composers, arrangers, songwriters, soundtrack builders, and engravers go about our jobs.

I would think Yamaha and Steinberg should have more than a passing interest in this. A really effective scanner/converter could feed right into Dorico, Cubase and probably other Yamaha products.

Given what we have seen of other AI developments, I have no doubt that this could be accomplished if sufficient resources were available. And that comes down to the business case. What we are seeing with many products (look at the new Canva/Affinity announcement, for example) is a base product that might be available on a perpetual license, but all the AI stuff requires a subscription.

Many of us absolutely deplore monthly subscriptions. I curse myself every month when I pay the Cable TV bill (I should be cutting the cable – long story.) OTOH, I am willing to pay for valuable functions. I only object to being charged every month if I am not using the service.

So, FWIW, I believe an AI-based music scanning product could be offered using a “credit” model instead of a subscription model. That is, maybe we could pay 50 cents for each page successfully scanned, and we might buy 100 pages of credits at a time. I would find that perfectly acceptable because I can value my time saved for each page I do not have to enter by hand.

Anyway, all of this is to say, I think that somebody ought to be doing this. We have lived with the existing half-baked tools far too long. I believe somebody will eventually do this, and it seems to fit the Yamaha/Steinberg business better than most.

Any thoughts?

I agree with much of your thinking.

The killer issue in my mind is the lack of a good financial business case.

There are already serious questions about return on investment for many apparently mainstream use cases for machine learning. The case for relatively more niche applications is presumably even worse.

Maybe the best hope for what you’re looking for is not with a for profit company, but a volunteer open source project that develops something like this for the love of it, rather than the need to get paid.

But those kinds of projects are difficult to will into existence unless you’re a software developer with the required expertise and patience and extra energy/time, who happens to share that specific niche passion.

I agree with all of that. AI today is where “the Internet” was in 1998. Capabilities were very limited. People dreamed of being able to do much more, but we lacked core building blocks, particularly in the area of transactional systems, which is what drives an economic business case.

The nice thing about music recognition as an AI application is that the domain is more or less finite. Yes, people in academic settings keep trying to introduce additional notational symbology, most of it being unnecessary. But the mainstream of music engraving has been pretty stable for a couple of centuries, and there is an enormous amount of training data available.

There are also breakthroughs still happening, greatly reducing the cost of training systems. The Chinese Deepseek results seemed to deliver comparable function with 90% less compute resource. There have been other big optimizations, such as learning how to remove irrelevant nodes from the neural net. The big companies are still approaching this as a brute force competition – who can build the biggest data centers with the most NVIDIA chips. But over the next few years, I expect we will see this mindset give way to more elegant approaches that will lower the cost.

I hope Steinberg is giving this serious consideration. I should point out that if there were a very successful music recognition solution, especially one that can deal with hand-written scores, this could become an important part of the GUI for notation apps, and maybe even DAWs.