This all suddenly feels a lot more interesting. The current sampling tech takes midi snapshots and glues them back a bit. This imposes a lot of limitations down the line. Yes we know this technology, yes it should not disappear, but essentially velocity sampling is like taking a rapid visual pictures of something that is moving, then trying to paste these two dimensional objects back together, often mocking up things like vibrato. Real world things you bang (which are surprisingly versatile) things you blow (picture Coltrane his mind his lungs his horn) , and things you twang, have nuances that only play out over time, that move in phrases.
Though a picture of a garden, might fool for a minute as a “real” garden, or even, short clips like Le Prince who filmed the "Roundhay Garden Scene” stutters into wobbly life and gives a “sort of” gist - robbed of it’s glories, present MIDI sampling methods are cardboard cut outs of real sounds, early Charlie Chaplin technology.
. Sure we can bury them, and some folk can get fair resemblences, if they stay within rails, but the ear gets tired quickly and pretty soon it is exposed. AI brings the possibility of microphones with ears. AI can learn to listen. It can create sample banks which include motion and knit them together, not in a cut and paste way.
Yes there is utube hyperbole here, and yes it’s not right yet, but the video below shows that daws can change radically.
One thing I can see as a barrier is input devices. If, for example, one examines the mechanics of orchestral instruments, one finds the transition from note to note widely differs. A “legato” for a trumpet, means to blow a note and then, without starting a new breath or lip articulation, to slide valves until the target note is reached. Some legatos are possible and some are not, at least be these means. A saxophone has no legato, nor does an orchestral harp, then of course the mouth organ it’s almost unavoidable. I think AI can know this and craft phrases with this knowledge. To some extent, the visual human interfaces can compensate for these things, and haptics can improve (real drum skins have billions of sounds), but it is like polishing your shoes to comb your hair.
AI has a new tech called “context aware stem generation”:
When you listen I would advise ignoring the “lift music” qualities. AI will improve this. At the moment, this is like the Cubase on Atari moment.
Anyway, this, is primitive but shows possibilities:
