You may have seen the new “Cantai” app, which is currently in beta, and has the aim of being able to sing your vocal lines from Dorico - amongst others - for you.
I have tried it out, so you don’t have to (yet).
In fairness I should make clear the following things:
It is in Beta
There is no API release for Dorico as such yet (they are concentrating on MuseScore first)
It’s actually incredibly impressive in parts
I have made precisely no effort whatsoever to make it easy for Cantai. No phonetic respellings, no rewriting of melismas, no nothing. Plug and play.
However, what it is mostly at the moment is hilarious.
The following video is my translation of an opera by Mozart (Bastien and Bastienne). For reference, the music proper is Dorico via Noteperformer, the sung bits are Cantai and the dialogue is some godawful nonsense thing whose name escapes me but makes Bastien sound like Trump and everyone horrifically American*
You may draw your own conclusions!
*Being American isn’t horrific (as such), but the inability of AI to even consider any other accents is….
I’d like to mention that Mozart wrote this Singspiel 8 years before the Declaration of Independence has been signed, so Mozart sung in American English is a kind of anachronism .
I’ll be watching with great interest to see how this product progresses. My church’s choir would really appreciate hearing lyrics in my practice tracks instead of vocal “aahs.” Depending on the price, I would purchase Cantai when it’s mature.
I agree with that sentiment, Joel, but I must admit I’m pretty alarmed at some of the mis-pronunciations is the OP’s sample. They’re going to have to get better than that before I’ll plunk any money down.
I’m amazed by two things:
1.) the apparent improvement in the actual singing engine / vocal synthesis over earlier demos
2.) how many hysterical mispronunciations there are. “I’m” seems particularly difficult.
I’ve purchased licenses for both Cantai & AceStudio and have been trying them both out.
BOTH have their strengths and weaknesses … BOTH have issues. BOTH seem to NOT like .musicxml from DORICO and I have figured opening .musicxml files exported FROM Dorico TO MuseScore and THEN using the .musicxml exports from MuseScore and rendering them in both Cantai & AceStudio will often work … sometimes neither, but don’t know why .musicxml exported from Dorico are not working.
For BOTH Cantai & AceStudio, sometimes the TEMPO will NOT export and/or the tempo will fluctuate. The export of MIDI Instruments always export tempo correctly.
AceStudio’s DAW-like interface allows for MULTIPLE languages to be rendered at one time, but there are fewer “Classical sounding” voices with AceStudio, yet you can blend voices nicely in AceStudio.
BOTH show promised and hopefully with their development it’ll get better.
if you happen to run into “unsupported operand type(s) for *: ‘NoneType’ and ‘float’” in Cantai (web) when using musicxml from Dorico this might be because of tempo markings.
Based on some hints I had a look at the musicxml exported from MuseScore (which does work) and Dorico (which does NOT work) in my case.
I “fixed” the Dorico musixml export by changing the sound & direction parts from:
it seems in Dorico the sound/tempo element is outside the direction-type, but in MuseScore it is inside of it. (I don’t know what the musicxml specification prefers)
with this manual change I could use a “problematic” musicxml from Dorico with Cantai web rendering.
After your change, the parent is <direction> which is valid.
But I can’t see what the parent is before your correction, those lines are missing in your code-snippet. If it would be <measure> or <part>, that would be valid too
It’s better, but it has a long way to go still for solo voices and possibly even further to go for choirs (if it has any). It’s not clear to me whether the last 20% is going to be easy, difficult, or impossible.
I guess it might be possible to use it to try out how lyrics in different parts fit together even if the sound isn’t that great.
I write for barbershop ensembles (quartets and choruses) and am very interested in how Cantai will work given we write on 2 staves, 2 parts per stave, lyrics typically attached to the lower vocal line in the treble clef, additional lyrics optionally attached to specific parts above or below the part (where they have features). I need to engage with the developers to see if the case I’ve got will work. I’ve seen choral music rendered, but it’s always been very simple use-cases… I have hope, but I think the reality for my specific (niche) usage is still a long way off.
Well, depending on your expectations, and assuming that you are currently able to access the cantai.app rendering page, I wouldn’t lose all hope.
As it stands, the rendering fails badly on any remotely fast lyrics, is dodgy on melisma and solo male voices, and pronounces things more randomly than Joey Essex
On the other hand, it’s reasonably competent at slow choral music, especially when you add enough voices to “average out” the more egregious technique errors of the solo voices. It’s fairly good at Latin, and can be bullied - with work - into pronouncing most things properly.
An advance look at the way it works (but is not yet released) in MuseScore suggests the ability to go through note by note and make adjustments to pronunciation without adjusting the actual lyrics, and other improvements.
In other words, I would regard it as akin to an orchestral VST as opposed to Wallander Noteperformer. If you just switch over to a VST and hope, it will sound hilarious. You need work and competence to make an orchestral score sound amazing.
Noteperformer, on the other hand, will sound fairly good right out of the box. I don’t think Cantai will be anywhere close to that any time soon.
“Carmen’s the new rendering engine I’ve been developing for Cantai. It’s built on a diffusion-style generative framework that predicts vocal spectra and expressive dynamics over time, using pitch, phoneme, and score context as conditioning inputs. The system models phrasing, timbre shifts, and breath flow continuously rather than as discrete events, which gives it a much more natural response to musical structure. Here is the most recent test render.”