Theoretical - Vocals And Whistling As Potential Tools For MIDI Expression

DaddyO · October 22, 2020, 6:01pm

Several threads here, as well as endless threads in DAW forums, have focused on the tools and tasks of musical expression for MIDI. Beyond doubt getting MIDI to sound convincingly expressive requires a great deal of work, often with tedious attention to detail, on articulation, on dynamics, on timbre/intensity, etc., etc., etc. This is all in addition to the obvious musical requirements of pitch and note length.

I have always thought the best potential for a tool that would render musical expression as naturally as possible would be one based on the human voice or whistling as a source to be analyzed and converted into the data needed to implement expression. It would not be exhaustive or perfect, but if we are talking about a tool that most everyone can use, what better medium for combining all the elements of musical expression at one time than the human voice? Pitch. Duration. Volume. Timbre/Intensity. Dynamics. Articulation. All available as one continuous, musical data source. Producing musically expressive material is almost effortless.

Imagine if you could sing or whistle (depending on one’s innate talents in this area) a line and have an audio to midi tool that would translate what it hears into appropriate MIDI information, including assigned CC Values. Tidying up and replacing translations that don’t quite work as desired might be needed, but oh what a savings in tedium.

And consider how naturally humanized such a tool would be. What is more human than human? Right now our humanizations are either algorithmic or based on keyboard input. Those not proficient in keyboard input are severely handicapped. Myself, I can input notes, but I’m simply not wired for playing keyboard well enough for recording (although I do it from time to time, with predictable results).

I know there are existing tools for audo to MIDI conversion (the standard ones I’m aware of are Melodyne and Cubase VariAudio), but so far as I am aware most of those focus on pitch, duration, and perhaps volume. And they can require more effort to use them than is justified by the limited results. None I am aware of tackle the full potential of the source material. I’m sure the task of creating such a tool would be very hard. For all I know the reason it hasn’t been done is it’s impossible with current technology.

My guess is that, in addition to overall capability, such a tool would require the user to “train” the translator to correctly recognize what is intended.

Anyway, just throwing this out there.

Derrek · October 22, 2020, 10:04pm

Things may be different now, but did not Finale have something along these lines years ago and drop it because the results were not worthwhile for them?

alindsay55661 · October 22, 2020, 10:51pm

+1 for even a basic version of this. I’d love to be able to get pitch, velocity and duration from a vocal performance in Dorico.

mducharme · October 22, 2020, 10:54pm

What about this? https://vochlea.com/

DaddyO · October 22, 2020, 11:00pm

Interesting. At first glance this appears to be attempting the sort of thing I’m suggesting. I’ll have to look at it further. Thanks mducharme.

Rob_Tuley · October 22, 2020, 11:33pm

Did you watch the demo video before you said it was interesting? (Note the non-judgmental language!)

DaddyO · October 22, 2020, 11:38pm

Here’s the developer’s walkthrough video: Dubler Studio Kit 1: Full Walkthrough - YouTube

While he’s definitely using voice to MIDI, with some CC control built in, his focus in the video is drum kits and audio filters. Not my interest, but some of the building blocks seem to be there if someone were to develop it for orchestral music.

Also, he seems to be thinking of the voice as a “right now” command source (notes, pitch, rather than a developing performance source. I’m not sure I could adequately explain the difference, it’s only my instinct that tells me this. Vocals have amplitude and other measurable qualities that develop and change organically over time. I can definitely sing a rendition of a musical line with enough vocal information to make it theoretically possible to translate as an expressive strings line. I can get louder and softer, I can swell and diminish, I can raise the intensity level, change the tempo in minute amounts on the fly. I can then record a harmonic line to go along with it. I can build a string section, or a brass section doing this.

I know I’m talking about something that doesn’t exist. But it seems to me theoretically doable for someone with the skills, determination, and youthful energy. What this guy has done is wonderful for his purposes. Thanks for the tip.

DaddyO · October 22, 2020, 11:39pm

Funny, no, I was watching the video while you were typing your question. It’s 20 minutes long.