I am not sure if this is already in the pipeline at Steinberg but I could not find anything about this in the forum, so I am posting this here.
The ADR features are a welcome addition to Nuendo and while I haven’t been able to use it, I appreciate that it is there and that Steinberg went out of its way to make ADR inside Nuendo easier and more streamlined.
1. AI Speech Recognition and Markers
Next to scoring for a lot of short movies and tv shows, I also occasionally end up editing and cutting dialogs. My workflow looks a bit different than what ADR has been set up for. Since all of our voice actors are off-site and often record in their own studio and in different time zones, handling the recording inside just one Nuendo session would be difficult. So I often end up with just one raw recorded session that is 1 or 2 hours long and contains multiple takes.
The problem I have is that it takes even more time to go through the session and pick out all the useful takes, create markers and export them.
What would make this process much easer would be a feature that imports various text file formats, such as TXT, DOC, PDF, XML and applies AI speech recognition to spot all the takes inside an audio track - matching them with the text in the script and automatically creating marker tracks out of these passages. There could also be a panel that lets the user decide which parts of the audio are actually useable to them and which parts are not. For example: a director could repeat a line for the actor, making the recognition think it’s part of the actors performance.
(By the way, it would be awesome if this import would be possible in the ADR panel anyway? Right now, I have to extensively copy and paste passages if I want to create the recording markers. It can be very tedious.)
Another note: exporting cycle markers, I came upon this issue: when I have several markers with the same text, e.g.
scene01_line_11_what-is-this-for
scene01_line_11_what-is-this-for
scene01_line_11_what-is-this-for
scene01_line_11_what-is-this-for
This is how I name my markers and that’s just because I want to be fast and I just copy and paste everything, hoping that in the export, it will receive an incremental number in the filename.
And indeed this works, the filenames look like this, however:
scene01_line_11_what-is-this-for-45.wav
scene01_line_11_what-is-this-for-46.wav
scene01_line_11_what-is-this-for-47.wav
scene01_line_11_what-is-this-for-48.wav
The incremental number Nuendo adds is global, meaning that every other set of markers gets a higher running number. I often end up having to batch-rename these files, leaving me with spending more time on this already time-consuming job.
What I would like is that every new occurence of text receives its own new set of increments, so that these start with “01” “02” “03” and so on.
2. AI Speech Enhancements
I know, Nuendo already has a voice separator. And it can be really good - in certain cases. But in other situations it can simply just fail. That is also the case for other dialog extraction tools, like the one in iZotope RX, which I also use regularly.
Some of the audio I receive can often be a bit messy (not every off-site voice actor has a perfectly treated recording studio). And especially reverb and background noise, like screetching from a chair can be a big issue. I’ve experienced this various times. It can also simply be a situation in which we are on a deadline and the director had a change of heart with script or actor in the very last minute, ending up recording a new take in a hotel room, on a beach or wherever. It happens - and this applies to big productions as well.
An alternative tool I use is “Hush” for mac (https://hushaudioapp.com/) which is very similar to Adobe’s speech enhancer tool (Enhance Speech v2 from Adobe | Free AI filter for cleaning up spoken audio I suppose).
It is an AI-based algorithm that uses a neural network, trained on the human voice to basically “rebuild” the voice track - ending up without (or very little) background noise or other annoying sounds. This tool is by no means perfect and it does not work in every situation - for example it can fail with nonverbal sounds, like screaming or mumbling. However, seeing how Steinberg has been implementing alternative time stretching algorithms or settings before, I see no reason why not have the option of different speech extraction algorithms inside Nuendo.
I am not saying Nuendo needs to be “AI’d up” with nonsense features - these are actual things that could improve the workflow as opposed to stuff that paints you an emoji, generates you a derivative piece of music (because it’s not fun to make music what the hell, SUNO CEO what’s wrong with you) or steals an actor’s likeness.
Looking at how well the recent updates for Cubase (and soon Nuendo) have been implemented, I completely trust Steinberg with this and am looking forward to future releases of Nuendo