BTW, just listening back to these demos this morning, I noticed that the female version has the sine tone playing the melody mixed in with the actual female voice. That seems like a bug. I didn’t notice it when I was working in Cubase, but I was in a hurry when trying to deal with switching the voice yesterday due to being late to take a dinner break. The sine tone seems clear in the WAV file, though. (I think I did notice that last night, too, but I was thinking I was just imagining things or there was something weird in the piano part’s transposition that gave that sound.)
I’d read another report in this forum where someone said they’d started with the male or female voice, then entered the lyric, then switched voices and it was still the gender they’d started with as opposed to the other. I did not see anything like that – the range was off for the female voice before doing the transposition, but it was clearly the female voice, just not idea of the range.
That’s actually a pretty good effort. The male voice’s leading consonants are a bit mushy and both voices seem to have difficulty with ‘R’ as a second letter… bwoad, bwave etc.
I haven’t gone anywhere near this plugin yet but your results are better than I would have expected.
Yeah, those were definitely challenging aspects, some of which I did my best to work around, but often enough without results that satisfied.
Funnily enough, with the “r” part, where the line goes “perilous fight”, it sounded more like “perilous fright” – i.e. adding an “r” – to the degree I had to look back at the phonetics to make sure I hadn’t accidentally put one in there (nope).
The other thing I frequently noticed in the male version (which is the only one I was editing lyrics/phonemes on) was a challenge with trailing consonants, such as the “d” on “hailed”, especially when a word ended in “d” or “t”. It’s not as obvious in the female version.
I wonder on the “r” thing, though: A problem with “r” sounds is something I’ve noticed in some modern pop singers (e.g. Shawn Mendes on his big hit “Stitches” from about a decade ago especially comes to mind). Maybe they’re trying to imitate that in Omnivocal???
That’s because the product was made primarily for the Japanese market. Put some Japanese text in there and see it’s night and day in terms of realism. It also lacks some of the IPA (International Phonetic Alphabet) characters, so some words might require some adjusting to get sounding more or less correct.
Yeah, I recognise the affectation in modern pop vocals. I’m not sure that Omnivocal has the swagger to pull it off though.
More Elmer Fudd than Shawn Mendes!
I find Omnivocal very rudimental. I tried to adopt it now in a tune, but I have to revert back to Synthetizer V Studio 2, uncomparable better. English phonetics are very poor. Try “Deck the hall" (not a strange text, in my opinion) and you’ll get a ridiculous outcome.
AI will replace everything, including the public. In a few years you’ll have machines making music for other machines. Airplanes with virtual pilots will fly virtual passengers from A to B. And humans? They deliver Pizza to each other.
And wouldn’t that be copyright infringement? I’m seriously in doubt about uploading any mockup done with VST to Suno… On the other hand, I’m thinking that’s exactly what a lot of kids are doing right now.
Adobe is the leader, Adobes Text to Speech has emotional tagging (Omnivocal does not) Emotion Tags allow a user to command the AI to make the vocal take on a specific emotion.
If i have a phrase and i need it to be sung back in anger i need to be able to manipulate the emotion tags of words and phrases.
Adobe text to speech has 2 very powerful tools.
Emotion Tags
Timing of phrases (the ability to manipulate the timing of words)
OmniVocal needs to add these features in the future.
Then as a user i can take a phrase and decide to have it sung quickly in the beginning and slurred and slow at the end and sung with an aggression that ends with a calm emotional delivery.
Omnivocal currently offers a user the ability to just get text into a sung vocal, but it lacks the manipulation tools to sculpt that vocal.