ASIO Guard and Threading issues are not just a windows issue

Oh my, no! Threads and cores and absolutely positively different things. If you’re puzzled by that important distinction, I can understand how you might be struggling to make sense of what you’re seeing in the activity monitor.

Occam’s Razor: while the explanation for what you’re seeing in the screen grab in your original post might be inefficient use of CPU, the simpler explanation is that you’re observing an affinity artifact.

You are deflecting by using semantics !

Its pretty common for the vast majority of the community to use the term “Threads” to mean “Logical Cores “, as much as that makes coders eye twitch.

Intel and AMD use that exact term in that context, as did Apple for all the years they were using Intel !

I understood as would most reading in exactly what Marcus was referring to.

The point Marcus is making, which you seem to be missing, is that not only are the cores being not utilised, but ASIOGuard has collapsed leaving all of that available overhead on the table.

Call it what you will , the end results is the same, a premature accumulation and overrun of ASIOGuard while leaving the vast majority of resources unused.

This was originally thought to be an issue on Windows only , with much discussion across numerous threads here recently, with the premise MacOS was not effected in the same manner.

Thats the point being discussed, eye on the ball and all that.

1 Like

I have opened the 2026 DAWbench sessions to Public Beta , but I will be pretty strict on some criteria, and will need to know exact system specs to see if it fills any gaps we don’t already have in the BETA Pool.

You can send me a PM over at my DAWbench FB page.

What he is calling “asio guard collapse” as I understand it is (please correct me if I’m wrong):

The cpu performance meter in Cubase is showing 100% and he’s getting dropouts, while the cores in the activity monitor are all showing much less than 100% utilization.

This observation should not come as a surprise. This does not necessarily indicate an inefficient use of cpu. As I described above, there is a simpler explanation: this is a thread affinity artifact. Note that this simpler explanation would account for why he does not see this issue in projects with simple bus routing and why he only sees it on a Mac with a large number of cores.

Is it possible his observations indicate a cpu utilization problem in Cubase? Yes, but there is an alternative explanation which is simpler, and thus more likely.

Correct.

Now lets be clear, there is never a 100% correlation to the Steinberg performance metering the TM/CPU metering, because the Steinberg Meter isn’t a CPU meter, that’s pretty clear and understood, and its pretty rare that you will achieve close to 100% CPU saturation in DAW use, except in empirical saturation test like by standard DAWbench DSP test for example.

That’s not the point being made.

Sure , so whats the solution is the question !

O.K, I am all ears, and I am sure others are hanging on with baited breath.

If its a simple explanation, does that mean you are suggesting its a simple solution ?

The numbers shown in the screenshot at the top of this thread may seem puzzling at first, but do not necessarily indicate a problem. In other words, it’s possible, even likely, that there is nothing here that needs to be solved.

As I mentioned above, those results are consistent with efficient use of cpu that fully utilizes the cpu as much as possible combined with a thread affinity situation where the threads are moving across cores. That would explain the 100% on the Cubase performance meter while all cores in the activity monitor are far below 100%.

As a developer of a popular virtual instrument which makes extensive use of vectorization, parallelization, and threading, I’m pretty familiar with this subject and I can attest to the fact that thread policy is a complex subject which can sometimes produce results that are counter-intuitive.

Is it possible there is room for improvement in Cubase’s thread policy? Yes, of course. But the opportunities for efficient parallelization in a complex project are limited. It’s unrealistic to expect anywhere near 100% cpu utilization in anything except a trivial project.

1 Like

See my post about threads above. Norbury_Brook is correct that cores == threads on Arm64 for that reply, for that specific use of the term “Threads”. But when you were talking about “Threads” above, you were using a different usage of the word.

It’s not that he’s confused about the underlying meanings. :slight_smile:

One thread that is maxed out which is moving across cores could give the results you’re seeing, right?

Don’t understand what you mean by that, though, in either sense of the term “Thread”. A CPU Thread that is maxed out isn’t going to move because by definition it’s limited to that CPU core. An application thread doesn’t have any concept of being maxed out, and can’t be run on multiple CPU cores/CPU Threads at the same time. It can be moved from one CPU Thread/Core to another, if the application thread configuration allows for that, but there’s a cost for that move.

Pete
Microsoft

2 Likes

Can you explain what you mean, here?

Pete
Microsoft

On MacOS, threads are not inevitably tied to a core, and they can move to different cores, depending on the affinity policy in place. The OS will sometimes do that to make optimal use of the cores. That would explain the results shown in the screen shot. That would also explain why he saw different results on a 10 core Mac than a 32 core Mac. That would also explain why this issue is more pronounced with more complex bussing.

I’m merely observing that the screenshot is not conclusive evidence of a performance problem. Perhaps there is a problem here, but that screenshot alone doesn’t indicate there is one.

I think you need to get more acquainted with the discussion on the other threads to get a wider angle on what is being displayed and discussed.

So you are telling me that you find it 100% acceptable that in the displayed RW mix situation, where ASIOGuard is maxxed while leaving all of that available resource, you believe that there is nothing to fix or address ?

Best you check out Reaper then, where with the identical session environment ported across, can access and assign the vast amount of the remaining resources before the engine is overrun

I won’t ask you to drop the veil here unless you want to, but I am interested to know which multithreaded synth you develop, as I’ll be more than happy to throw it into the mix in DAWbench.

P.M me if you prefer.

Something is getting lost in the mix here, who said anything about 100% CPU utilization in RW application ?

I’ll repeat, you really do need to get acquainted more with the discussion leading up to this thread to get a better angle on whats being shown and discussed.

Unless you get some hands on , you are just being dismissive without understanding the underlying dynamic being discussed.

1 Like

This seems to be the crux of the issue. As I’m sure you know, 100% utilization of all the cores can only be achieved with trivial projects. What is acceptable utilization? Obviously that depends on the bussing for the project in question.

As for a comparison to Reaper, that would be far more interesting than the screenshot. But, as I understand the answer to my question above about the M1 vs M3, this issue only occurs with an M3 Ultra, so it may not be easy to find someone who can try to replicate the results. However, doesn’t that also suggest this is a non-issue for the vast majority of Mac Cubase users? I’m not dismissing anything…I’m just trying to gauge the severity of the problem.

I’ve never hidden the fact in all my years on this forum that I’m the principal developer of Omnisphere.

Right. App Threads (we really should qualify what we’re saying here) can also be moved to different cores/CPU Threads on Windows when Windows is handling the thread scheduling.

On both operating systems, IIRC, an application can lock an application thread to a specific CPU core or a set of CPU cores. That would be CPU affinity for an app thread (or a process, if you use it at that level). I have no idea if that is happening here.

APIs (for Windows):

Beyond that, you have MMCSS (I just heard TAFKAT die a little inside when I mentioned that), which has its own scheduling characteristics. MMCSS threads are a limited kernel resource, and ideally should never exist in a count of > 1 per CPU Thread. Intent is for them to be acquired and returned via the Real-Time Work Queue APIs.

Pete
Microsoft

1 Like

Hang on, you need to get off this whole 100% utilization angle, no one has presented or argued that.

The fact that ASIOGuard is prematurely being exhausted leaving massive amounts of available resources is not so easily dismissed, just because end users configure groups or busses.

Oh, so you don’t actually have a high core Mac to even attempt to replicate what is being reported, and all that you have just offered up has no practical hands on basis , you are dismissing all of this on theory ?

Right, of course , lets just sweep this under the rug as most Mac users won’t have access to the higher core chips, but you are forgetting that higher core systems on Windows are far more common and widely used, which is where this initial discussion was focused, but has only now filtered across because we finally have some first hand experience displaying the same dynamic on Mac.

Of course you are dismissing this as a non event, tell that to Marcus who just dumped a whole wad of $ on this monster Mac system to get away from the issue on what we believed was Windows based, only to find its happening on MacOS as well.

I haven’t participated here for 15+ years, I wouldn’t know you from Adam, so I asked the question.

Right, so Omnisphere is fine running its dedicated MT routines while running within Cubendo, without it colliding with the DAW’s respective threading routines, is that what your are telling me ?

That will be interesting if true, because there are more than a few instances of other VI’s that do not play nice when using MT, and Steinberg specifically recommend disabling those routines, but I digress.

2 Likes

Ya just had to go there, didn’t ya !!! :rofl:

2 Likes

The crux of the matter here is: what exactly constitutes premature? Again, I’m merely observing that the screenshot presents no evidence that the exhaustion is premature. In my own tests, I don’t recall ever seeing anything that would suggest premature exhaustion. James Zhan’s performance tests show Cubase ranking highly.

However, yes, bussing of the project has a profound effect on the parallelization opportunities, and hence the opportunities to spread the load across threads, and hence the opportunities to make maximum use of cpu resources.

Please understand I am not dismissing anything. I am just trying to shed some light on the observations in the OP.

1 Like

Thanks. As you might imagine, I’m already pretty familiar with those API’s :grinning_face: .

As someone who has first-hand experience working on these issues, I hope my explanations of threads and cores and affinity and parallelization have helped people gain a better understanding of this subject so they can draw their own conclusions about Cubase performance on MacOS.

We are going around in circles !

You have made no effort to acquaint yourself to the issue being reported and displayed, and are making all your assertions looking at a screenshot with ASIOGuard pegged while leaving 80+ of the available resource on the table, and without any practical first hand, and despite your claims to the contrary , are dismissing it as a non event.

You are kidding me right, they are empirical saturation tests that I have explained multiple times, like my own DAWbench methodology, will scale incrementally fine, but are irrelevant to RW application in instances where ASIOGuard accumulates and overruns prematurely.

Yes I will keep repeating that !

Also, point me to his video testing and reporting on an M3 Ultra, I may have missed that one.

Round and Round we go !

What light have you shed exactly, you have zero understanding and/or practical experience on the behavior being reported and displayed, you do not have access to a larger core Mac, you have made no effort to even understand how the reported is displaying ASIOGuard prematurely being exhausted, and continue to dismiss this as normal behavior.

Go and share those views on the Core Handling in coming updates for Windows 11 thread here, see how you go, I’ll grab some popcorn.

I think we are done here , peace and out !

2 Likes

Could it be that this is used for the “Prefetch xy” threads (as there are as many created as (logical) cores are available, at least on my system)? From a DSP standpoint it makes sense to keep those always on the same core, because cache etc… Although, I guess that modern schedulers are usually quite good anyway at trying to always schedule the same threads on the same cores (as long as there are enough resources available).

I am not dismissing anything or anyone. I am providing an explanation for an event that I acknowledge as very real. This explanation is supported by the facts I have laid out concerning thread affinity.

I have done my best to clear up the misapprehensions expressed earlier in this thread concerning threads, cores, affinity, and parallelism. This was done in a sincere effort to help people understand the issue so they can draw their own conclusions.

1 Like

It sounds like you keep mixing up the meaning of the word “thread”. For greater clarity, it would be welcomed if you could specify if you’re referring to CPU threads or application threads because they are not the same thing.

1 Like