The Visual Microphone: Passive Recovery of Sound from Video

Here!
Hmmm! :open_mouth:

That is amazing, actually really the kind of stuff I’d like to be researching in my studies the coming year!

Fascinating. I will put off that Neumann purchase because I already have an empty bag of chips! It does seem that the bag of chips is hearing something we are not hearing… an alien or something.

Amazing…
wonder how it captures this
https://www.youtube.com/watch?v=MM13AV1B48g

I call BS, just on a gut feeling.

Then trying to think about it more quantitatively. The bag of chips/leaves will move as a summation of all the sounds in the room. Trying to deconstruct one “voice” from among all the sounds (radio in the background, phones ringing, air conditioner noise, air conditioner-caused air currents inducing potato chip bag movements … wouldn’t it be analogous to figuring out the tune of a song by inspection of the .wav file shape?

Further, they say they can detect frequencies several times higher than the frequency/frame rate of the video. Ready to be corrected here … doesn’t that contradict Nyquist … wouldn’t it all be aliasing without the ability to extract signal?

Of course you can only extract all the sounds in the room, but this is experimental so they can test it in silent environments.
As for the Nyquist frequency: Yes you are right, but with the rolling shutter in a consumer camera, you can take multiple time samples from the same frame, which increases the Nyquist frequency :wink:.

Im not too convinced…

But,

What use would this have ? (Apart from spy agency usage)

The science bits were just off the cuff on my part, it’s just a gut feeling (non-empirically supportable) that feeds my skepticism at this point!

But I’d be interested to know more about the part you wrote above which I bolded … “multiple samples from the same frame” in particular … wow!

Haven’t got the time to go into it right now, but read this:
http://en.wikipedia.org/wiki/Rolling_shutter

You’ll see it takes parts of the image at a different time, so if you know the pattern it makes, you have information from several time-instances within the same frame.

Fascinating, thank you for the link, Strophoid!

So, to my frrble way of thinking, this technology is essentially using potato chip bags, plant leaves, and the like as lo-fi mechanical transducers of the sound pressure waves … does that sound right?

I’m just thinking the sound pressure wave-to-object motion coupling is going to be way too crude to “decode” meaningfully … that there would be too low a S/N ratio. I’m even wondering whether the examples were too good to be true!

But I’m keeping an open mind! (Maybe by mathematically combining the observed motion of multiple objects at once, the S/N ratio could be increased to a meaningful level …?).

Thanks again!

I’m not sure how limited this is actually, I ‘think’ objects will resonate at the frequencies of the sound quite nicely.
Note that in the examples used, they had a very simple signal consisting of just a few sines at different frequencies. In silent surroundings I reckon with some smart gating in the frequency domain you can get a decent reproduction this way.

This same technique has been used for quite some time by spy agencies. :wink:
I remember too a demonstration where they placed a tiny paper sticky dot on a window of a room where the blinds had been drawn. The window acting like a large diaphragm, they were able to record the tiny vibrations by monitoring the dot and listen in on the conversation taking place within.

HA! :sunglasses:

Lots of fun additions in this topic!

So this means we can reconstruct sounds from old silent-movies, oh wait to slow shutter speed :slight_smile:

Great thought, peakae!

No problem, it can be easily done! :smiley:

  1. Integrate the response of multiple objects in the frame. Even with a slower frame rate, this should help reduce the S/N , by eliminating other non-sensical solutions to the equation that would arise from analyzing the motion of only one object. That, plus a little inter-peak reconstruction, and we’ll be able to hear every vowel and regional accent!

Then use some noise print capable program and remove unwanted stuff and it should sound better than the original? Well, maybe not haha!