I find it rather comforting that there is no major difference between mac and Windows, because I always thought that the general code of the audio processing engine should be quite identical on both platforms, abstracted away from the specific threading APIs… So it makes sense to me.
I have done my own test, and had a look at Cubase threads in WinDbg…
On my system (AMD 7950X with 32 HT), Cubase creates exactly 32 threads named “Audio Prefetch 0-31”:
228 Id: 4cdc.6378 Suspend: 1 Teb: 00000000`209d2000 Unfrozen "Audio Prefetch 0"
I assume those are the ASIOGuard threads, one for each core, maybe with a lower priority than the real time threads, which are called “Audio Realtime 1-14. Why 14? No idea. maybe 16 physical cores minus two for other tasks.
Then there are two “Audio Prefetch Trigger” threads, and six “Audio Catchup 0-5”. No idea what they are good for.
Test 1
I created a test project with 8 audio tracks going into 8 groups, loaded with the very CPU intensive IK Tape 440 plugin, one instance per group. No realtime usage, no control room.
You can roughly see where the plugin threads run, there are tow or three more than the 8 groups, but that is to be expecte. ASIO load is ~30%, a bit more than the highest core saturation, but ok, not far off.
Test 2
8 audio track going into 8 groups, two plugin instances each.
the number of loaded cores is roughly the same as with test 1, just with more usage.
ASIO load hasn’t even doubled. most saturated core is maybe 60-65%.
I think this confirms that Cubase - as we assumed - processes plugins on one channel always serially in one thread, effectively calling the processing functions in a loop.
From a computing point, this is very efficient, there is very little overhead, and you benefit from cache and memory locality.
The disadvantage is that the other cores stay underutilized, but if you want to spread the plugins of one channel evenly on other threads, you potentially lose the cache and memory locality, and things get really complicated with thread and data synchronization, which definitely introduces overhead of its own. Maybe there are other ways that I don’t know, maybe it would be worth it for modern CPUs, I cannot say, I am not a developer.
Test 3
16 audio tracks going into 16 groups with one plugin each.
I count ca. 19 rather loaded cores, three more than the groups. probably whatevery cubase needs to do else, dunno. But this test shows that it scales fine in width.
ASIO load is ~75%, most loaded core around 65%, similar as test 2. Why ASIO load is higher here I cannot explain, maybe internal thread synchronization overhead.
Test 4
For fun: 32 tracks into 32 groups, one plugin.
This is completely overloading. All cores saturated to +90%. Somewhat ecpected.
24/24 works, with ASIO load ~90%, but playback OK.
Not that I didn’t really check for dropouts, this more more a simple scaling test to see how plugins get distribute to cores. If you have more ASIOGuard tracks than prefetch threads, things will be different again. I assume some of you have done similar tests…