Well, I have to say that while latency in Cubase on Mac may not be as good as on PC I can run most projects without problems at 128 samples. This is on my now 3 year old Mac Pro (see my signature below) and with mainly virtual instruments. Simpler projects also run at 64 samples and I can also play live single Vsti’s without drop-outs at 32. Occasionally I have to raise the buffer to 256 samples for larger projects but that’s ok for me.
I think performance depends very much on the drivers for your audio interface, for example running at a given number of samples doesn’t always produce the same latency in milliseconds when using different interfaces. So if you find performance at lower latencies unacceptable then it may also be your audio driver’s fault. I’m very satisfied with the Metric Halo drivers, they are absolutely stable and fast and provide an input latency of around 4 ms at 128 samples (as reported by Cubase).
To be honest I haven’t noticed much change after the so called optimizations in 5.5 in comparison to the versions before… but as performance was ok for me already before, it didn’t bother me much. I haven’t tested C6 very much until now because I have to finish a project in C5 first but performance seems to be the same as in 5.5. What I have noticed though was the change from Mac OS 10.5 to 10.6: projects that before needed 256 samples now run at 128.
Of course I would love to see further optimizations regarding multi-core and low latency performance but it might also be the fault of OS X itself that - according to some tests from various knowledgeable people I’ve read - can’t provide the same low latency performance as Windows. Logic does have a better low latency performance than Cubase on Mac but it kind of cheats… it works with two separate buffers, one for playback and one for recording and as a result only the tracks you’re recording to run with the low buffer you specify in your audio settings while the rest of the tracks run at a significantly higher buffer (usually 1024 samples but I think you can set it also to 512 and 2048).
So I don’t know if Steinberg can optimize it much further on their own, but of course I hope that they will manage to do it nevertheless 