Having been through similar paths in the past several times (I’ve literally designed shipping commercial real-time audio APIs in the past,) that’s unfortunately not in spot 1, 2, or 3 for a real-time audio API.
VST 3 is actually pretty well put together. It went slightly too heavily on the COM-like interfaces IMO, and the full separation to allow the GUI to run on a different machine than the processing is probably overkill, while powerful and likely useful for very high end studio/location setups and, say, Yamaha control surfaces or whatever.
Unfortunately, all of the Linux plugin formats (open source) aren’t solving the right problems. (JUCE, LADSPA, and so on.) I looked at a previous version of CLAP, and it was a start – slightly more experienced design than the Linux stuff – but not really a reason to change the world.
Also remember; There’s also been Audio Units, and there’s the Ableton plugin API. Neither has really taken the world by storm (although AUs is the closest.)