Omnivocal in Cubase 15 is fantastic

I was expecting this, but Omnivocal’s native implementation is stellar—even though it currently offers just male and female voices. A few more varieties would be great, maybe even a voice morpher like the new IK’s.offering.
I’m almost tempted to sell my Dreamtonics bundle.

AI-powered stem separation is interesting, but UltraShape really caught my eye—I can already see myself using it.

Ripple Edit didn’t make it this time, but I’m hoping voice cloning debuts in N15.

Nonetheless, a heartfelt Congratulation’s to Steinberg—the last two releases have been spectacular.

Best,
RM

6 Likes

Hi… I just installed it but I can’t figure out how to enter lyrics. Can you provide a brief description? Thanks!

I figured it out…

Very unstable for me. It worked but as soon I made some minor changes it stopped working. Also when playing midi I can hear crackling sound but once rendered to audio it’s fantastic.

IMO the plugin is diabolical. You are paying money to teach AI how to write songs while it spies on you so it can be better to sell itself to other users. :zany_face:

1 Like

Is Omnivocal based on an AI engine? Does it require an Internet connection or is it local? I wonder if it will support different voice models in the future, like ACE Studio, Synthesizer V and Cantai. It would be amazing for demos.

I use EmvoiceOne, and it is very good at what it does but to have it work like in C15, would be even better, that is via text in the Key Editor

AI is going to train on whatever is deemed morally acceptable, by society so I personally don’t care because uniqueness is the key and the fact that you can rehearse before involving real singers just saves time at the end of the day.

What I want to know however is, will this be coming to Nuendo?

I am tempted to get the C15 trial but N14 is running quite stable (trial but I have a download code from the recent special) so I don’t want to upset my system with beta software and I can’t afford to maintain both Cubase and Nuendo so I am hoping this “feature” will be shared between the applications.

When you use something like ChatGPT, it isn’t “learning” from you in real time. The learning and training happen on the developer side beforehand. What you’re interacting with is a closed, pre-trained model. AI companies spend a lot of money training these models — they’re not going to let random user input directly change or “pollute” that core system.

So when you use AI inside software — whether it’s Backbone, Omnivocal, whatever — you’re just using an already trained, locked model. That’s also why it only offers, say, two voices and still needs your text input: those are the options it was shipped with.

Think of it like a synth: instead of playing notes, it outputs language. It’s basically a text-synth.

This is why AI companies with lots of users don’t develop but Ai companies with tons of funding on the dev side do, there growing from the dev side not the user side.

An Ai company with $1 million investment and 100 million users is going to lose to an Ai company with $3 billion investment and 1 user.

OmniVocal is the start of a revolution, where it ends up will be a synth, that has 1000 singers, who can all sing 7 octaves, and can follow or deviate slightly from any notes you play/key into an editor, then fine tune using articulations to get the desired effects per voice.

If you can play a piano, the future is yours, the better you are at articulating a voice through piano playing the more you will accomplish.

2 Likes

Exactly 100% this. The whole my “AI music plug-in is learning from me and spying on me” conspiracy theory is getting old, especially while one has Google all up in their business everywhere, anywhere, anytime. Lost of people don’t understand how LLMs work. It’s not as glamorous as SkyNet…not yet.

3 Likes

Misconception > The AI is learning from 100,000 users play at various levels from amateur to advanced playing a piano/keyboard.

Reality > The Ai is being trained with 10 top pianists and 1000s of top level piano playing on the dev side.

1 Like

The funny thing is that Yamaha actually started this “revolution” somewhere north of 20 years ago – I know I reviewed a product based on their Vocaloid technology (Zero-G’s Vocaloid MIRIAM) for the (now long defunct) CakewalkNet ezine back in October 2004. The Vocaloid version at that point was 1.0.5.12 – they’re up to Vocaloid 6 these days:

It would surprise me if Omnivocal didn’t share some lineage with modern Vocaloid. Certainly the idea of entering MIDI notes then attaching words and articulations to those notes was already there in Vocaloid at the time, though the editor was standalone, and you had to render audio from there or Rewire it to get it to play in the DAW (I was using SONAR at that time). Omnivocal will definitely be way more convenient. Hopefully with way better results, too. :slight_smile:

2 Likes

I can see users creating basic vocal edits in OmniVocal and then moving them into SUNO to generate AI-based vocal variations from that source.

If I were Steinberg, I’d be looking to lean hard into this space — it’s going to be significant. The strongest products solve the widest pain points, and one of the biggest challenges producers and composers face is simply finding a suitable vocalist, or any vocalist at all.

A refined, integrated AI-vocal workflow would attract a large number of users, because it directly meets an existing demand and a clear creative desire.

The mission statement:

“Meeting demands. Exceeding imagination.”

2 Likes

Would it not depend, on how far downstream you are from the source?

Is vocaloid the engine here? I’ve read a couple of guys claiming it is.

I’m mulling waiting for Nuendo 15 rather than updating both Cubase14 and Nuendo14….the singer thing doesn’t hold much interest to me as a feature ….although I did preorder re-sing and have dabbled with suno & other vocal-generating thingees….mostly to be aware of what’s going on.

Unless you find, a way to import a library, then it may be likely that the synthesis side of things would be from Japan, since from my experience with using Emvoice One, I have found that the phonem libraries need to be recorded again, when errors or limitations are found with respect to vowel creation.

I don’t know, one way or the other. However, given the virtual instrument is branded as Yamaha, rather than Steinberg, and Yamaha’s long-term vocal synthesis product/technology is Vocaloid, I’d be surprised if there weren’t at least some technology underneath Omnivocal that has roots in Vocaloid. There was a discussion on this sort of thing in the Steinberg Lounge a few months back, and I gathered from that discussion that Vocaloid has progressed quite a bit over the years.

When watching the video demos Dom Sigalas did of Omnivocal, I noticed that, when he was entering English language lyrics, the result also displayed phonetic spellings that reminded me of what Vocaloid used to do. However, I suspect most any vocal synthesis software would need to do a similar thing. It made me wonder, though, if Omnivocal would allow overriding this, for example by entering the phonetic spellings directly. There are obvious cases where that could be necessary, such as in the case of heteronyms (i.e. words that are spelled the same but pronounced differently, such as in the case of bass, the fish, and bass the instrument or vocal range), as well as cases where it might just be desirable (e.g. for words two speakers from different backgrounds might pronounce differently – e.g. “Boston” by someone from Boston versus someone from most anywhere else).

I haven’t done anything with Vocaloid in a very long time (probably at least the late 2000s, if not a bit earlier). When I reviewed Vocaloid MIRIAM for a demo, I know I’d use it to get a female vocal on an early demo of one of my songs. The only “production” use I made Vocaloid, though, was using Vocaloid LOLA (also from Zero-G) to subtly layer some background vocals with my own on one song (“Undertow” – it’s on all the streaming services). It is that latter type of use that most interests me at this point. (I really could have used it in a recording I did earlier in the year where I was trying to simulate a choir with just my own vocals. Not only were there issues of range, but also just wanting different timbres, where creative audio processing, even with tools like iZotope’s Nectar Backer module, fell short for achieving the results I was hoping for.)

That was always an issue for Emvoice, particularly when the only way to get a word is via a dictionary which, on the whole, is large enough but there is a phonem dictionary as well, that more often than not, produces the correct pronunciation.

The early vocals that Emovice made, e.g., Lucy & Jay, in my view need to be replaced and I know I could do this today and start with a Cubase trial but alas I own Nuendo.

On the topic of training a genAI model, all stated above is correct, models get trained on datasets not realtime user input.

However, let us not be naive and assume that our input is lost in the static. Certainly the terms of usage will state that they will not sell data to other companies and will mind our privacy. But seeing where things are going in the USA, it is safe to say that anything you put into any internet connected service will be stored analysed and used to train datasets. This is the most valuable data (user interactions information) as opposed to vast troves of randomly scraped data.

omnivocal sounds interesting and ahead of the curve of current genAI tools, it does however still sound uncanny to me. But that can still change.

All future workflow till 2027 would be something like this.
ChatGPT->Suno ->stem separation ->OmniVocal ->voicecloning → remixing ->mastering
Or old tape recordings ->stem separation ->OmniVocal ->voicecloning → remixing ->mastering

2028 → singularity

or

  1. OmniVocal (to create a vocal designed from your melody input and text)
  2. SUNO (To convert that OmniVocal into vocal variations)
  3. Cubase (Sample Editor - to straighten out the vocal against the tempo & Vari-audio or Melodyne to make corrections)
  4. Then export and send to a real singer to redo the vocal (if this is your desire).

Cubase will be better than SUNO with voice regeneration in the long run, they need it for Nuendo, YAMAHA is already invested in this behind the scenes. The multi billions of YAMAHA will make this happen, SUNO by 2030 will be completely controlled by the major music labels and used for them and them alone, users will probably be paying $50 -$100 to use the service, the majors want to use the tech for themselves ( paying users being used like guinea pigs to make the rehash remix’s + pay the majors while doing it !!! WOW getting you to pay them to make there music they charge you to listen to on spotify while underpaying you to steal yours to feed illegally into SUNO ) to create 1000s of rehashes of music they already own, to get even richer than they already are.

Backbone already has AI within it, to create drum sounds and re-imagine sounds you put into it.

The vocal department will evolve and get better over time, OmniVocal is a beta, obviously YAMAHA are now using advanced AI in the background to slowly bring it to market, 1st through Cubase, then to everyone willing to pay.

YAMAHA and there AI ventures

While that may happen, we could be on the verge of seeing creativity tools that democratize Hollywood-level production for anyone with just a prompt. But the flip side is terrifying: job displacement across music and film, copyright nightmares on steroids, deepfakes that feel alive, and AI-composed symphonies capable of manipulating emotions in ways we don’t yet understand.

And this may be just around the corner—by 2028–30 at the latest. SUNO is a toddler compared with what’s coming. These emerging systems aren’t based on Google’s transformer AI model but on multimodal generative synthesis, neuromorphic computation, and quantum assistance. The potential is frightening