Wishes for next gen vari-audio

Pitching based on a AI.
which will make possible to

  • change notes in ultra-quality without loss of quality.
  • While keeping all the sibilants, breathing, air, and other details, even if the tone was changed to the whole octave.

It would be a NEW ERA, next GEN, breakthrough!
It would also save a lot of time on recording additional vocals for harmonies and other effects.

I hope you have talented engineers who knows how to do it. But if you do not have any ideas at all, I am think, one of the idea can be such that the AI will not only “recreate” information, but also analyze the entire audio clip, for example, in the case for vocals, to study the timbre of voice on all notes to change other notes like it was in a real.

I am very inspired by the capabilities of AI in content creation, for example, a new noise-reduction in photoshop for RAW photo, makes things just a different level! I really hope that the audio industry will begin to make things of the same high level, and not just make a “AI” sticker on a poorly working shameful and useless “something”.

I think the problem with pitchshifting sibilance, breathe and air is that they´re not tonal content, more like transients and noise content

@Jari_Junttila lol, you do not understand at all what AI is and how much it differs from simple algorithms that are simply making “tonal content.”

I am active with AI stuff, so I know quite a bit, but you have to understand that using AI on variaudio would require quite a lot processing power and waiting times, especially on high quality results.

@Jari_Junttila Thank you Captain Obvious!
No one is telling to replace the algorithms completely, in that photoshop you can choose to use the old noise reduction or the new one.
nothing prevents you from running variaudio in the standard mode with simple algorithms, but also having an item in the menu something like “ultra-quality”, that activate scanning by AI. so you always have a choice. and if you are not satisfied with something just use old one…
and by the way, this process job can be transferred to the GPU, which will not only speed up scanning, but will be even better and correct way, so in this case it will even not affect the scanning time.

I did not say it is bad idea.
You are now misunderstanding my writings purposely.
I know really well how AI and models work, also different accelerators for AI.
The thing is you would have to train models for it first, now you need content for the model, since there is no tonal content, how would you add it to train such thing that doesn´t exist…
I´m all ears as usual

No need to be rude.
We all have wishes and I wish Cubase could become sentient and be the new president.
Jokes aside, AI is powerful, no questions about it. However, AI is such a different approach as you said, something hard to do might be quite easy with AI, and something that seems easy to do might not be easy at all.
I work at a company that is developing audio AI which focuses on music production (like the other 324526 companies that are doing the same thing at the moment). You know what, AI is weird, and the current diffusion model visual strategies can’t translate into audio without reworking it from the ground up.
I do wish the same thing as you do, so I can get a huge bonus hehe. Let’s hope one day we will get there :slight_smile:

@Jari_Junttila The answer to your question is partially given in my first post - the some information will be taken from the analyzed file itself. All other technical aspects should be thought out by professionals, but you can teach AI on many things, for example, on physical modulations from real-life examples, air pressure, movement, and other laws of physic that are “showed” AI in reality (one of the primitive examples are the impulses of reverb) and foul AI to understand how it sounds and how it changes, etc…

but anyway this topic of ideas is here, and not an engineering topic of implementation and work proposal. If I had working models or code, I would not create this topic, but i go and sold it for companies. Do not mistake work and business, with free ideas.

Then vote for this topic!
No one says that it is simple, but I am convinced that even those technologies that are exist today, if we load a bunch of data in them, it will be if not perfectly, but at least better than that simple algorithm from just math formulas that there is now .
So yes, before that day comes, first we must give birth to an idea! The idea is primary, because we cannot start moving in implementation, if not focused on a specific purpose. So this topic is the first step, idea and goal. so remains to find talented engineers, and the Steinberg have enough money for this. :smiley:

If I’m not mistaken, this type of deep learning is already in Steinberg’s wheelhouse if you look at the latest version of Spectral Layers. So I don’t think it is too far fetched to imagine a version of Vari-Audio in a near future with the features described.

@mlib Yes, I also hope for it.
but, unfortunately, the algorithms of “Spectral Layers” in case for separation of the track into parts do not work very well. For example, service Mo*ses copes with this many times better. Moreover, 99% perfect. So everything is possible already today.