Whats your take on Omnivocal?

Hey! I made a video about Omnivocal today, had a blast making it and thinking about the topic, but I’m super curious what you guys think about vocal synthesis (AI vocals) and how Omnivocal plays a role in this. Have fun watching and I look forward to your reactions :).

As far as I can tell, Steinberg and Yamaha aren’t marketing Omnivocal as AI vocals – the word doesn’t appear in the Cubase 15 new features information, nor in the Omnivocal Beta User Manual. Rather, they’re calling it vocal synthesis, and that is a pretty different thing.

In fact, Yamaha has had their Vocaloid vocal synthesis technology since at least the mid-2000s. I know because I wrote a review on Vocaloid-based products from Zero-G back in 2004:

They’ve made some advances since then: :slight_smile:

They do mention AI there, and it would be interesting to know how they’re using AI in modern-day Vocaloid. However, the point I am trying to make that this is not simply generating a vocal from lyrics or a melody, but synthesizing a vocal from a melody and how the lyrics correspond to that melody. And that was already possible in Yamaha’s Vocaloid back in 2004, far before AI started making inroads in creating vocals.

My suspicion is that any AI use in Omnivocal and Vocaloid has more to do with training the software how to make transitions, logic for placing breaths, etc., as opposed to specifically emulating real singers from a massive database of real vocals.

As to the question being asked – i.e. our take on Omnivocal – at the forest level, I see two potential, maybe three, uses I may make of it:

The first is the potential for layering virtual singers with my own background vocals to get more of a group effect than just doing all the vocals myself gives. (Side note: I did use the Zero G Vocaloid LOLA this way on one of my recordings years back, though I blended them extremely subtly with my own vocals in the released mixes of the song.)

The second, which is one I hadn’t actually thought of prior to actually trying out Omnivocal, is for placeholder lead vocals while I’m working up an arrangement of my instrumental tracks. What I’d done prior to Omnivocal was just use a vocal pad sound for this, but trying Omnivocal out on my current project has show that it has advantages as I can enter the lyrics quickly to have those for reference, I don’t have to worry about expressiveness (as these vocals won’t be used on the actual recording), and Omnivocal sits in the tracks much more like my own vocal will later on, unlike vocal pads. Even if the pronunciation is off, it’s not a big deal since the vocal won’t be used beyond working up the arrangement. It could also be useful for vocal harmonies ahead of actually recording real harmonies.

The possible third would be if I’d like a female vocal on a song demo, be it for pitching directly to a “friendly” female artist (as in one open to a demo that isn’t polished on this front) or for providing a melody and lyric phrasing reference to a female demo singer.

I did not mention more “electronic”/”robotic” uses because, at least to date, I don’t tend to make the sort of music where that would be useful.

How good is Omnivocal now for my potential uses? I think it’s better than the video might suggest, but it’s not great.

In particular, entering lyrics can actually be done much more quickly than may be implied by the video, and you can sometimes get “good enough” on the pronunciations, then go back and address specific issues by editing phonemes only in the areas where there are problems. (Those issues aren’t a problem for the temp vocal for arrangements use, but would be for any exposed Omnivocal uses in final product.) Also, one key to making Omnivocal more expressive and believable is automating the controls, such as vibrato, air, power, etc. in ways that emulate what a real singer might do vocally. (I suppose it helps to be a real singer on this front.)

If you’d like to get an idea on my initial test case on this stuff, I posted a comment in another forum thread that includes notes on what I did and links to two piano/”vocal” demos, one with the male Omnivocal and the other with the female at:

The biggest issue I’ve found thus far is that there are just some English pronunciations that it doesn’t do well, even when tweaking phonemes (at least as far as I’ve been able to achieve to date) with the most obvious example being the Elmer Fudd-like pronunciation of the letter “R”. I don’t really think that will be a dealbreaker for my main two uses since the first use will be blended with real vocals and the second will not be heard in public.

The bigger question mark for my first use will be how easy it is to make expressive background vocals that match mine sufficiently for the layered context, phrasing, etc. But tools like VocAlign can probably help enough on the phrasing side, and maybe the expressiveness side will be close enough between the layering and any automation of the controls.

1 Like

Hello,

I bought the Update from CB14 Pro to CB15 pro, but I’ve the problem, that the download assistant don’t start. I’m waiting for a hint to this problem from steinberg technical support.

So I’ve tried to download all the new stuff from the CB15 download page. But the link to the omnivocal beta Win64 don’t work and ends withe the error “access denied” and no installation file is found in my download folder. So I’m not able to test the new omnivocal. Can someone send me the installation file from omnivocal to test it? Help urgent needed. Many thanks a lot in advance.

Best regards from Germany
Svens

My issue with it is that it’s too auto-mode-ish. It’s very difficult to “drive“ it places you want it to go because it’s very busy trying to prevent something “wrong” from happening. It doesn’t recover from pitch changes as many singers do, and if you want to add a dip or rise in pitch, it feels like it’s countermanding what you do because it doesn’t fit in the profile of what it thinks is “good”. So it winds up feeling sterile. And the danger there is of that sterile feeling becoming as commonplace as hard-quantized drums - ultimately it short-changes the listener because users of the software are in a hurry and it’s close enough for them. And no, I don’t think of this as a sketching tool, because I know that a significant portion of its users won’t think of it that way either - and having seen this happen repeatedly over time, I don’t think of it as an instrument choice, but rather as an economic/social choice. Attention bedroom songwriters - singers like to sing. Befriend some.

1 Like

Can you elaborate what role AI plays in Omnivocal?

Hey Rick! First of all, thank you for taking the time and writing this; it almost feels like an extra article on the topic, and i loved reading it.

Regarding the main points you mention. I think it’s good to have a difference in AI and Machine Learning. I suspect Omnivocal uses Machine learning - a task driven AI model that extracts vocal patterns from big data, to find out how a singing vocal works). I mention this just very briefly in at the end ov the video. I chose this angle because I think that despite the marketing choice (I have a background in marketing and communication) of Yamaha/Steinberg, the public opinion about Omnivocal is that it is regarded as an AI-Vocal tool, and as such it competes with other tools in this category. This is also what I observe if I look at fellow youtubers directly comparing it to tools that do brand their tools with AI. Personally, I like that Steinberg didn’t go for popular terminology; and your comment that the technology has been way longer than just now underlines that.

What I didn’t include in the video is the way I thing Steinberg should improve the tool, in order to speed up the work flow. I think that ideally in addition to how you input melody and text now, there should be an option directly sing something into the tool (or drop a wav file on it). If than Omnivocal can figure out pronounciations for you, that could result in a more speedy and practical workflow.

I’m still not really convinced by the usefullness of it, but in my case I have no shortage of excellent vocalists, that only need a half word to give the thing I’m looking for. However, in the case of 1) bedroom-producers, that don’t have access to vocalists or 2) professionals that quickly want to create a choir of background singers, like you mentioned in your reply, I can imagine the value of a tool like this.

Thanks again for your detailed reply!

Hey Johnny! I suspect some sort of AI model is used in the backgrounnd of this tool, this is generally known as machine learning, but also called task driven AI, or big data-analysis. The reason I suspect that some sort of machine learning is involved is that you have to sign off on Steinberg and Yamaha collecting your data before you start using the tool.

There are no seperate installation files for Omnivocal as far as I know. Hopefully Steinberg clears this in next C15 patch.

Exactly! Music is about collaboration :slight_smile: .

(Btw, I checked out the examples on your ‘Omnivocal is fantastic’ thread, very good idea to use a song like that because it is so widely known and coverd by a pretty vast amout of stellar performers; so it gives a good benchmark)

1 Like

I cannot get it to run right it bogs down my computer I will wait until the get a stable version!

I can’t say I felt this in my limited uses to date – just the initial “Star-Spangled Banner” demo attempt I shared in the other thread and use on my current project as just a placeholder vocal for working up the instrumental side of the arrangement.

But, in the former case, I didn’t have any fixed ideas of what I wanted it to do, other than on English language pronunciation, where I had to work around some things and was unable to work around the “Elmer Fudd R’s. I just wanted to see what it could do and how close it could come to believable (i.e. human-sounding) vocals, as well as how much work would be involved in achieving whatever I came up with. It actually took a fair amount of work in tweaking phonemes to try to get closer. But I just did the Omnivocal controls automation as live overdubs from my MIDI controller, so they were quick. I also did a live overdub of pitch wheel riding to add some of the sorts of slides and such I’d be likely to do, and that required a bit more editing afterward to refine it. But I think the whole project took on the order of 3.5 hours, including recording the backing piano track and mixing.

And, in the latter case, I really didn’t care how natural it sounded (since no one but me would be hearing it), so I was just entering the lyrics quickly against a MIDI melody track I’d already recorded, and only tweaking a few phonemes where its default pronunciation bugged me too much to hear over and over while working on the arrangement. :slight_smile:

Quite honestly, I don’t think it “thinks” at all. The “instrument” has certain things its programmed to do based on the text, MIDI, and controls and automation of those controls, and it just does those. You can potentially work with that stuff to try and get it closer to what you have in mind, but there’s no “intelligence” trying to work at cross purposes because it “thinks” it has “better ideas”.

An analogy might be having a master landscape painter trying to get masterful results out of a beginning art student by telling that student what to do. The student might get better results with the master’s intervention, but the results aren’t going to be on par with what the master would do.

Well, it’s a sketching tool if you use it that way, and it’s not for someone who decides they won’t use it that way. Other people’s perceptions don’t matter to any given person’s tools and techniques.

Well, I am a “bedroom songwriter (and producer and singer and …)”, and I ultimately sing on my own recordings (to date, 7 full-length albums, 2 EPs, and something like 70 singles, a fair portion of which overlap with tracks on the albums). I’m unlikely to ever use Omnivocal for a keeper lead vocal, but I suspect I’ll eventually use it for keeper background vocals to layer with my own background vocals when I want to get richer textures than having all the BGVs be mine (even with potential formant and/or other processing tweaks to vary the timbre) and possibly for doing some higher parts than I can sing (or make believable with just pitch and formant shifting).

And I was very pleasantly surprised to find how useful it was for adding a quick lead vocal as a placeholder while working on the instrumental side of the recording I’m working on right now. That is a workflow choice versus having to set up a mic and track a scratch vocal myself (which I’ve sometimes done in the past, but, more often have just used a vocal pad, with no lyrics and a very different tone). Not to mention that, if I change the lyrics over the course of working on the arrangement, I can just tweak those in Omnivocal instead of needing to set up a mic again (or live with hearing the wrong lyric over and over).

I agree, though, that this will also be an economic choice, just as using virtual instruments rather than hiring specialist musicians for the real-life instruments would be. And I’d suggest the word “workflow” instead of “social”. I’ve got plenty of musician and singer friends, but, even ignoring business considerations like work-for-hire agreements (mandatory if pitching the resultant recordings for sync – of course, that would also be an economic consideration), there would be workflow issues like scheduling (versus my “just doing it now”, especially in cases of working against a tight deadline), coaching them on what I want, making sure I’m equipped to record their results acceptably (in my small bedroom studio), etc.

It would be interesting to know (really more out of curiosity than anything else) how, if at all, they are using this sort of thing. The thing is, though, that Yamaha’s Vocaloid was doing much the same thing, just with a much less friendly interface (i.e. having to use a standalone editor for entering the lyrics and melody and either render the audio from there or hook it up to the DAW using Rewire) before machine learning was practical. Perhaps they’ve used it to improve Vocaloid over time, and those improvements may have benefited Omnivocal. But I’d be surprised if the sort of AI or machine learning (or AI-type analysis a la Ozone Master Assistant or Waves Curves Equator or Curves AQ) is actually going on on our computers when we are running Omnivocal.

Public opinion and facts/truth are often at odds with one another. :rofl: But you’re correct that some (many?) will view the products in the same light, and it could certainly compete with something like Synthesizer V there, though not (directly) with something like IK’s new ReSing or other products (whose names I’m forgetting at the moment) that need you to sing a vocal first, where the product (or service) then replaces your vocal with another singer’s voice.

While that could be useful to make Omnivocal better, I suspect it would ultimately make it a very different product – more like ReSing. That could be helpful for some workflow considerations (e.g. if using it for a track that will make it into a final recording, where it improved the pronunciation and phrasing compared to just editing phonemes and MIDI data directly).

However, it would dramatically slow down the scratch vocal use – for me at least – compared to just tracking the melody on a MIDI controller then adding lyrics in Omnivocal (and maybe correcting a few phonemes here and there if Omnivocal’s default translation of English words to phonemes bugged me too much). In particular, setting up a mic, tracking vocals (probably including multiple takes due to lyric screwups along the way), doing at least some basic cleanup that would be annoying to listen to over and over while working on the arrangement, etc. would take more time. And that’s not even including the consideration of making lyric changes between initial scratch vocal and getting the arrangement far enough to be ready to track keeper vocals – a quick edit with Omnivocal, but having to do the whole “set up a mic, track, etc.” thing again.

Usefulness of a tool to any given person will obviously be different, depending on their needs and workflow. I suspect my most frequent use will actually end up being for the scratch vocal while working on an arrangement case, which is something I hadn’t even thought about prior to actually trying Omnivocal. While I’ll also likely use it for layering background vocals with my own at times, that will likely depend mostly on music style and cases where I need (or at least want) more than just my own BGVs. (The choir-type thing is something I’ve only needed a small number of times to date.)

Ultimately, I am a bedroom producer, and, with a very small number of exceptions, I am only producing my own material (mostly original songs, though I’ve probably put out somewhere between 20 and 25 covers over the years). I have easy access to a vocalist (me), but I think there will be workflow benefits, even when I ultimately want only my own vocals on the final recordings, to using Omnivocal while working on arrangements. Perhaps it could also be useful during the songwriting stage, but I’m less convinced on that front as I don’t usually even work in a DAW when doing the writing.

I’m using colloquial language. This doesn’t do anything close to thinking, and neither does anything else the public has access to. But to be crystal clear - its algorithms preclude certain nuances in order to get a predictable result. I’m a trained singer and have edited vocals on a high level for many years, and so I might ask something of this kind of thing that a more casual user might not. But the truth is that this tool isn’t made for someone like me - it’s made to sound just good enough to get by in certain situations.

Where it matters is the effect it may have on music. More and more often the things that are created to solve problems that don’t necessarily exist, the more problematic it becomes.

I’m going to stick with social. The music community is increasingly scattered and isolated. If you are satisfied with how you do things, that’s great - I’m not about convincing you otherwise. But there are other ways to look at things like this. I think that having a pool of people one can call and work with is important, because they will almost always bring something you wouldn’t have thought of. But I will also say that great background vocals are about blending in many ways, and since Omnivocal isn’t going to follow you, you have to follow it to some degree. Which means its limitations are deciding things for you, which seems less than ideal.

I love new tech and love seeing what it will do and what it will bring, but my favorite kind is tech that removes distance between the user and their ability to express, as opposed to just their ability to tell something to sort of express for them. Because of the limits of this tech, it doesn’t fall into that first category - more into the category of “here’s something I can’t do, don’t want to do or don’t want to learn to do, and a bit of tech that lets me appear to do it without having acquired any of the wisdom I would have while learning how to do the thing.” I don’t think there is anything unjust about not being able to sing - that’s just how it goes - and so I don’t necessarily think that heroic measures need to be taken to facilitate the simulation of singing. But if one is going to do that, then this should be more controllable than it is. We can disagree - no harm there - we all get to choose how we do things.

I’m not going to respond to most of this – it’s really a case of different strokes for different folks, and, if Omnivocal isn’t right for you in any capacity, so be it; there are also lots of tools that others may love that aren’t right for me, as well.

However, the notion of needing to follow Omnivocal because it’s “not going to follow (me)” just doesn’t fly (IMHO). If I can’t get it to do what I want it to do in some specific context, then I won’t use it – it’s as simple as that. But I wouldn’t in any way expect it to “follow me”. I’ll play it, edit it, automate its controls, process it like I would live vocals (or in other ways), including potentially using VocAlign to tighten its phrasing against my own vocals, and/or whatever suits me for my purposes in any context where it serves a useful purpose. And if the results still don’t measure up, I’ll find another way to get the best results I can in that context. In that sense, it’s really no different from any other virtual instruments and/or other tools I may use.

Well, no matter how much I might want to be able to do a believable female vocal part myself, and how hard I try to learn to do it, I’m destined to fail. :rofl: I did actually use a Vocaloid virtual singer, somewhere back in the first decade of the 2000s, to blend with my background vocals on one of my projects, and it worked in that specific context. I feel confident that the development of Yamaha’s vocal synthesis technology since then will allow me to make similar uses when I need that, and I’m at least cautiously optimistic that it may also facilitate some related uses, such as in the virtual choir scenario I could have used on one of my recordings from earlier this year. But I won’t know for certain until I have another project that needs that sort of thing.

In the decades I’ve been producing my own music (and occasionally producing music for others), I’ve learned how to do many things I hadn’t known how to do previously. That’s a never-ending process and one of the things that keeps making “new toys” exciting to try out and see if they may prove useful in the context of things I do and/or inspire new contexts.

Well, I do sing, and, with the exception of my duet project with Beverly Bremers (“Make Me Feel”) and the one case where I used a Vocaloid virtual singer to double my background vocals, I do all my vocals on my recordings. But I see this as a potentially useful tool for me (and my original response in this thread was only meant to provide my take on Omnivocal in that context, not suggest how anyone else might, or might not, find it useful), both for the background vocal doubling bits I’d been most interested in when I first read about Omnivocal and for the unexpected temp vocal while arranging instrumental tracks use I’m currently making that may actually end up being the way I’ll use it most frequently. (And, yes, I know I could just set up a mic and do a scratch vocal instead, but then what if I change the tempo, change the lyrics, decide to change the key, … between when starting on the arrangement and when I’m ready to track keeper vocals? That’s exactly the reason I’ve previously just used vocal pad sounds in that context, but they not only don’t do the lyrics but don’t sit in the mix as well as Omnivocal does and are even less expressive than Omnivocal.)

I find I’m agreeing to both perspectives;

the tool (or a future version of it) being usefull for BGV’s, and maybe scratch vocals. However in my case even with scratch vocals the main thing I want to transmit is the emotion, and while Omnivocal is usefull for transmitting words, melodies and to some degree vocal techniques, I don’t see it convay the core of what a performance should have? But than again, I have a mic connected to my system at all times.

For me is spot on. A while ago I watched a BBC interview with Jacob Collier where he stated ‘I started of creating this beautiful garden of music all by myself, and than at one point I realised that I had to open up the fence and let others in and do their thing, which was scary as hell, but eventually resulted in this [referring his new album].

My takeaway from this conversation is that new tools and innovations create new creative opportunities, but also cause new concerns and challenges. And since we cannot control what new tools become popular, as producers we need to be aware of how much the instruments we play affect our core values..

Apart form how you and I use it, this is a tool (among other vocal replacement tools) that directly effects our (the music as a community) ability and amount of impact our music can have in a world where we too often choose what we want (stuff that is perfect) over what we need (stuff that is real), because it directly interferes with the main carrier of our message.

To be honest, I haven’t felt my current scratch vocal use feels all that devoid of emotion, despite my not having even automated any of the controls (e.g. power, air, etc.). Mind you, I’d played the temp melody track (actually a long time ago – I’m revisiting a recording I started in 2004 in SONAR, worked at on and off over a couple of years period, but never finished for one reason or another) manually on a keyboard, as opposed to entering notes in a piano roll editor. Thus, I’d probably played the melody “emotionally” with whatever sound I’d been using as the placeholder. Perhaps Omnivocal responds to note-on velocity variations? Of course, it definitely responds to timing variations, so it’s not starting from some quantized note entry scenario.

But I’m not looking for a great vocal for this purpose, only something that fills a similar sonic space to what my vocal will eventually fill to make sure I’m leaving the right sort of room in the arrangement, and possibly helping with any “mix as I go” work I might do enroute to finishing the arrangement. I also just slapped a CLA Vocals preset on Omnivocal to help in that sense. (I’ve never used CLA Vocals on a lead vocal in a final mix, but all-in-one plugins like that can be helpful as placeholders while working on other elements of an arrangement and doing a degree of mixing along the way.) And Omnivocal does that a lot better than the vocal pads (or “doo” patches) I’ve used in the past. Not to mention that having it “sing” the lyrics helps me keep track of where I am in the structure of the song.

I don’t generally have a mic set up in my bedroom-type studio. It would just get in the way (and collect dust) most of the time since most of my time, even on a recording with lots of vocals, will be spent on the arrangement, vocal comping and editing, and mixing. My actual vocal tracking time is likely to be under an hour for lead vocals and maybe 2-3 hours at most for BGVs, depending on how many parts I’m doing, the complexity of the arrangement, and if I’m experimenting with the parts as I go or already know what I want. This aspect might also be an area where Omnivocal could be useful – i.e. for working out the BGV parts ahead of actually tracking them (and maybe doing that while still using Omnivocal for a scratch lead part).

Agreed.

For me personally, I look at new tools in one or both of two ways:

  1. Does it help me do what I already do more efficiently?

  2. Could it help me improve the results I can achieve (in the context of work I already do or that I want to do but have not yet managed to pull off to my satisfaction) in the face of my current limitations (e.g. skill sets, financial resources, time, etc.)?

For Omnivocal, I think the scratch vocal use ends up satisfying the first criteria. If I also manage to use it in the background vocal arrangement stage (not talking here about whether I end up using it for BGVs in the final recordings), that may also help here (but only if it gets me to BGV arrangements more quickly than ad-libbing them when I get around to recording them).

At this point I am hoping it will help on the second criteria in cases where I want to give the impression of layering my own BGVs with other voices and/or doing believable parts in ranges I can’t sing well myself, but that remains to be determined at this point. (It’s possible the recording I’m working on now will give me a chance to try that out, but I’m a long way from being ready to do BGVs at the moment.)

A hot mess- Mostly due to problems between score and piano roll editing. I took a lead sheet I created in 13, moved it over to C15 score ( a mess in itself, because the lyrics don’t come over), re- entered the lyrics in Score, and played it with omni – errors mostly due to note length issues. So I shifted to key edit, but the lyrics didn’t line up with the notes, so I fixed that, but when I opened score some of the lyrics were missing, in the wrong place, notes not in the same place. I think Omni follows key edit, not score, and problem with Key vs score seem to be the root of the problem. Which spills over into the AI vocals performance..

I have a song that shifts from English to Cajun French and I shudder to think what would happen with that lead sheet production..

I think I’ll wait until the bugs are ironed out..

There have been other threads (I think on the Cubase forum) where it’s been clarified that the text fields in the Cubase Key Editor, which are where Omnivocal gets its lyrics from, and the lyrics in the Score Editor aren’t related (at least at present). So changes you make in one place won’t affect the other (and vice-versa).

If entering lyrics for Omnivocal from the first note in a phrase in Key Editor, where there isn’t a 1:1 correspondence between notes and lyric syllables (e.g. as in the case of melismas where one syllable spans multiple notes), there is a shorthand for entering the lyrics to make them spread correctly. To be honest, I haven’t tried that yet, but I think it is something like entering multiple hyphens, so maybe if you want the word “someone” to have its first syllable spread across two notes and its second spread across 3 notes it might be something like “some-one–” (not sure if that comes across with the forum’s translation of hyphens, but the idea was one hyphen after “some” and two hyphens after “one”). Someone mentioned something along those lines when I’d mentioned my own quick way of entering lyrics, which was basically either to stop at a word that needed special treatment or enter two (or three or …) of the same word, then go back and edit the phonemes later in the spots where the melismas were.

That’d probably be a challenge in general since Omnivocal wouldn’t understand the Cajun French, so you’d be down to directly entering phonemes, but, as it stands now, even if it did, the lead sheet and the Omnivocal lyric entry would need to be separate since they don’t tie together in the current implementation.

well, sorta - if I have lyrics in both places that differ, Score sometimes shows them both, but omni speaks the key lyrics. Which makes it a PITA is when I need a lead sheet for a real singer. The workaround is to make a separate track for the “real” lead sheet. . Looking forward to when all three can work as a unit……

there is a shorthand for entering the lyrics to make them spread correctly.

Where can I find this? Omni sometimes mispronounces words, so it would be great to force it to correct pronunciation.. Most of the time, one syllable per note works, and it seems that hyphens don’t make a difference. Sometimes if I remove the phonemes, Omni will pronounce correctly. And sometimes it will autofill the removed information..

I figured French lyrics (mon dieu!) would be a challenge :slight_smile: . So where can I learn how to directly enter phonemes?

As far as Omni’s performance, yep, it’s beta.. And don’t try to get the girl to sing the upper register - she sounds like Minny Mouse up there. IOW Maria Carey has nothing to worry about.. It’s give it a score of 70%. Looking forward to the next uptdate.