What audio tech is this?

In a audio snippet from a video, the words were replaced…
Same voice, same intonation, same ambience in background.

Which software can do this?
(Analyse and duplicate/create background ambience, voice print, intonation, words etc…?)
-ambience is obviously the easy part.

Vocaloid 2020?

Ambience isn’t “obviously easy” in my opinion. But if you want a reply that makes sense then link to the source so we can hear/see it first hand. To my knowledge there’s no one software that does those things automatically. The way you describe it or the way I understand it is that it was probably done using PT or Nuendo (or similar) and just different dialog recordings and a good dialog editor (person).

The source is of no importance…

The software is obviously not available in every cornershop…

The example was probably done , as you suggest, by amazing editing…

do you think that vocaloid will advance so far?

If you want an answer to your question then it is of importance. By listening to the source one can draw conclusions. By looking at that screenshot of a waveform much less information can be gathered.

Depends on what software it was and what you mean by “cornershop”.

Something will. In our desire to make money we will try to make any and all labor by humans replaced by machines that don’t complain about working too many hours, not getting a raise, unsafe conditions etc. Our drive for profit has already begun to kill art and this will be a natural extension of it.

I will pm you the source…

interesting… hold on a sec…

So, that was actually very interesting considering the source and the topic. Very interesting.

I would say that it’s a re-edit done by the company for some reason, I’m guessing political reasons. What’s curious though is that there simply appears to be more ambient sound in the “first” version compared to the second.

When I look at the spectra of them using iZotope’s RX software I can see as well as hear that there are background sounds that are present in both versions. So the question is what the sources are of the first version’s additional background ambience. In other words it’s clear that the dialog was edited and that there were two sources of ambience. My guess is that the first version got more ambience to make it more dramatic, in other words part of the background didn’t originate with the footage but is from some other place and/or time.

As for what software could have done it the answer is any decent editing software, from Pro Tools to Cubase/Nuendo to DP to Logic etc, or video editing software. What software did the job? Probably Avid Media Composer, Adobe Premiere or Final Cut X, essentially whatever the people working in that company normally uses.


PS: I don’t think we’ll see too much Vocaloid-type software for this type of use though, seeing that editing a source is less bad (but still bad) than straight up fabricating what the source said. So more “reputable” (clearly questionable) outlets won’t do it, but I wouldn’t shady ones probably will.

Actually, to be clearer, from a workflow and technical perspective:

Source A: Dialog and background recorded together.
Source B: Only background sound recorded at a different time and/or place.

First version = A+B
Second = A, re-edited

That’s what it sounds like.

See, i actually think that such software exists… Somewhere…lol

Anywho… Great editing etc none the less:p

Njoy your weekend:)

That software may exist, but I’m betting that the person actually spoke for a lot longer than either version, and that it was abbreviated in both cases. But the emphasis was changed by the choice of what was cut out. It’s fairly standard procedure in for example documentary, lifestyle and reality programming (I do a lot of dialog editing). So no special software is needed really.

Either way, you enjoy your weekend as well :slight_smile:

Nice to let us all in on it eh! :unamused:

I don’t think it’s a big deal, but if he doesn’t want to tell you… Maybe he has his reasons. It was just a news report that was edited.

Of course i will send u the link, dear northwoods…

Took me a while to recover from driftpunches microwave…

But as lydiot , err, Mattias said, its not a big deal… Im just keeping certain topics away from here…

The important thing to me was if anybody had knowledge of special speech software that could exchange/create words with the same intonation etc…

(Imagine vocaloids that could convincingly sing long lost artists after analysis etc.) - of course this tech could also used for malice - as with everything…

Have a look at

Only lately I extracted a snaredrum from a pop song mix and,
…life-safeing,… shouts and noises from a classical concert recording.
I’d be interested in hearing this also. Can you please put a link here?

Thanks -

That’s pretty interesting. You may be able to use a restore plugin to remove the ambience then print that then add that back in with the original audio but have it phase reversed then the result would be solely the ambiance. I wonder if you can then create some sort of impulse response from the ambiance track/file.

Alright, since most of you haven’t heard the source I’ll just reiterate what I said before: No special software was needed for what was done. It was a simple case of editing one source and in one version then adding background sounds. That’s all it was, guaranteed. Nothing fancy.

As for extracting ambience that can be done with iZotope’s RX v4 and up. The issue is however that it’s designed for room tone rather than actual “live” ambience. In other words, if you record dialog in a room with a refrigerator that’s humming, and a ceiling fan that’s slowly spinning, then iZotope and other software can extract that which isn’t dialog and create a new file of only the “room tone”. But that’s very different from you recording dialog on a sidewalk where there are cars going by and people walking by while talking etc. That can’t be successfully extracted the same way.