All this talk is about one specific benchmark, the DAWbench. The numbers there are pretty unequivocal, but the differences are most pronounced at 64 sample buffers or lower. That’s where ASIO on Windows really takes off and CoreAudio stumbles.
I’m not all that technical and don’t know about mach kernels, but to me there appears to be a difference in design philosophy: ASIO is a single-client straightshooter, whereas CoreAudio offers system-wide multi-client versatility.
As an analogy, I’d like to think that CoreAudio sacrifices some straightline speed for better cornering.
For me the trade-off is worth it, I like to listen to something on YouTube while playing along on a VI in Cubase. I can have twenty different applications open, all playing back and/or recording audio/video simultaneously, all running at a safe but fast 128 sample buffer. Not being able to do this on Windows to me is like having to repatch cables everytime you want to use a different application.
But If you need to run more than 100 voices at 64 sample buffer, ASIO on Windows is the way to go.
Another difference in design approach is that Apple’s Logic has an internal fixed higher buffer, independent of the interface’s buffer setting, which works really well at ultra-low latencies in CoreAudio. The one time someone did the DAWbench on Logic it performed better than Cubase on Windows, even at 32 samples, while simultaneously screen-capturing the whole thing in Quicktime.
Cubase works fine on OSX, and I rarely run out of room with an Mbox3Pro at 128 samples, roundtrip latency about 9 ms.
Nonetheless, if you use Cubase and need ultimate performance, I think it’s safe to say that Windows is the better choice. If you want ultimate performance on Mac, use Logic.