Groups and Cpu Usage

Hey everyone,

I’m experiencing some confusion about grouping in Cubase and CPU usage.
When I group channels together in Cubase, I’ve noticed that the CPU load seems to increase, which is causing headaches, especially in larger projects.

For instance, when I render four channels into one and apply a processing chain, it seems less taxing on the CPU compared to grouping those same four channels together and applying the processing chain to the group channel.

Specifically, I’ve observed that if I create a group with heavy processing and route multiple channels to it, the CPU load rises almost as if each channel had its own instance of the plugins.
Shouldn’t the CPU load remain consistent, regardless of the number of channels routed to the group?

Additionally, I’ve noticed that using certain VST instruments, like Superior Drummer can get pretty unstable in this configurations.

I mean is this normal behaviour? or should I investigate what’s causing this issue?

So you’re using for example four audio tracks and the scenarios are:

  1. The audio tracks have their own plugins and all have their outputs set to main mix output.

  2. The audio tracks do not have their own plugins and their outputs go to a group track and the group track has the same plugins as each had in the previous example…


If you can grab some screenshots that show the plugins used and the routing and the performance meter in Cubase for each of the scenarios.

In my experience what increases the realtime load on Cubase/Nuendo is making the signal chain longer or adding more heavy processing. Parallel processing, i.e. ‘more tracks’, tend to take longer before it increases the realtime load because the load can be spread out over more CPU threads / cores. So when I have tried this the experience has been that just duplicating a track with plugins keeps the realtime load low for many duplicates, but starting to lengthening the chain by outputting a track into a group (with the same plugins) into another group (with the same plugins) immediately increases the load.

There is more work going on in the routing multiple tracks to a group track and processing in that group track versus rendering those multiple tracks to a single track, disabling the original tracks, and putting the same processing on that rendered audio track. In particular, this takes the extra mixing work (i.e. to mix the multiple tracks and feed it to the input of the group track) out of the equation. I wouldn’t think that work would be on par with having the group track plugins on each of the source tracks, though.

Of course, there would also be more disk load in the multiple tracks feeding a group scenario, which could come into play if disk I/O becomes a bottleneck. I wouldn’t think that would affect the CPU load much, though it could add to considerations relating to how much work can be done in the space of a sample buffer.

The other thing that could affect CPU in the multiple tracks feeding a group channel is if those individual tracks have active inserts, automation, etc. However, my assumption here is that you’ve already taken those considerations out of the picture if they were a factor earlier and CPU is tight.

FWIW, one reason I sometimes render what had been feeding a group track to audio (without the group track processing), and reproduce the plugin chain from the original group on the new audio track while disabling the original tracks and any plugins on the group, is so I can later freeze the audio track that represents the group with its plugins, thus lightening CPU usage. (My system is vintage 2014, so this tends to be necessary if I’m doing heavy processing on group tracks.)

Thank you for your response! Mattias yes the two scenarios are correct.
I’ll remember to grab a few screenshots next time this happens but the idea of having a series of plugins in a chain vs parallel channels with a similar chain having a different load over CPU Threads is probably what’s happening.
I usually don’t do crazy group mathrioskas except for a Mixing group for the elements and a Buss Group for the Overall Mix (which doesn’t have much going on).
I’ll see if I can modify my workflow a little bit to consider what you suggested and see if it goes a little better.

Rickpaul yes I think the everlong “freezing groups” requested feature will definetely be of help for us rocking vintage settings.
The render is definetely the last resource when things get a little too hot for my system to handle, but sometimes this changes in Cpu load are very out of the blue while still in production stage and constantly rendering would slow things down!

Thank you both for your replies!

This topic can become very complex very fast, allow me to make a simplified example: if you have 8 tracks routed to a group, each with an FX on it, then take all 8 inserts and move them to the group track, you’ll have a higher load.

The explanation is that in the first setup we have 8 tracks with one insert each that get processed on their own thread, their output then goes to the group when the signals are summed. In the second setup, we have 8 inserts, each having its input fed by the output of the previous processor - here processing is serial instead of parallel and needs to be done on one thread, unlike the first setup.

It gets much more complex with routing. To make another simplified example: if I have a guitar sim chain on a track with monitoring on (thus real-time), that track only will be processed with buffer/device latency instead of ASIO Guard latency. But if that track is sent to a group and to a send, the whole path must be processed with real-time latency. The example involves a monitored track just to make things more evident.


It would actually be a great feature to be able to visualize signal flow (for at least these two reasons):

In many real-world professional productions, projects can become large, with complex routing, very quickly. I’ve spent a lot of time trying to hunt down routing errors, so being able to visualize the full signal path for a track would be incredibly helpful to see if something is routed where it shouldn’t be. Yes, there are workarounds (I can select a group track and see what’s going into it for example), but when you have, say, lots of sends, this becomes tedious in a hurry.

The other reason is what was mentioned here: Such a visualization could also show latency and CPU impact of various routing choices, because, as @Fabio_B said, things can become very complex very fast, and having a tool to help us understand that complexity so we can make efficient choices in our projects would be very welcome!

Tagging this as a feature-request


All sensible points, @Timo00, thank you for the post.

We have been evaluating various ideas and solutions for a while now, the feature request is already in our system and sure we’re aware how helpful this would be to quickly identify potential processing-intensive paths.


Awesome, it’s great to hear that you’re already on this! Looking forward to seeing this in a future update :slight_smile:

1 Like

In the simplified example given, this serial versus parallel explanation makes a lot of sense. However, the example given seems to me to be far from real life, even taking the simplification into account. In particular, if you have one insert each on the individual tracks, then remove those inserts and put them on the group to make 8 inserts, that is an entirely different processing chain than the original parallel case, even ignoring considerations like threshold settings on a submix versus threshold settings on the components of that submix.

A more practical, but still simplified, example would be if you have 1 identical insert on each of those 8 tracks, then you remove those and put a single instance of that insert (with the same settings for simplicity, but potentially adjusting if needed for real life level considerations). In this case, my inclination would be to expect that the single insert in the group channel, versus that same insert in each of 8 individual tracks, would help on the CPU front. But the explanation here makes me wonder if that would be the case.

A more practical case for me would be something like the 8 identical tracks (e.g. individual background vocals) having an identical set of inserts on them, probably with identical or nearly identical settings) then feeding a group channel with a different set of inserts that applies to the group (e.g. background vocals submix). Of course, I’ll end up freezing the individual tracks fairly early in the mix process, but the group inserts will likely be modified more over time as the mix progresses, but will still likely need to be frozen in a submix at some point.

An additional real-life complication, at least for me, is that any submix buses will also likely go to a main mix bus for mix-level processing, before finally going the stereo out for mastering-level processing, so there are at least two serial stages after the parallel levels at the tracks and instrument groups submix levels.

Yes, it merely served the purpose of explaining a concept, it wasn’t meant to resemble real-life. Same for the 8 identical processors, used for simplicity (8 inserts vs 8 insert → the theoretical load is the same, the outcome is not).

Thank you Fabio, very interesting. Can you point out some do’s and dont’s in large projects to preserve CPU usage? I know this highly individual.

Wow, what a mean question :joy:

I’m afraid it’s very difficult to come up with a meaningful answer. I would actually need a few days to cover part of the cases.
If I manage to find some time in the next days, I’ll collect the most common use cases that lead to performance issues.

It all very much depends on how one works, not only the layout, but also the kind of processors used and habits.


I have a different one, since we have your attention:

What’s the current status on you guys developing the audio engine further as far as multi-threading behavior? My understanding is that there are some other DAWs that do a clearly better job at distributing workloads across all available cores, and since we’re now looking at CPUs with a ton of cores this seems pretty important.

I’d have to dig up recent examples if you need it, but just wondering if this is still being actively worked on.

I can’t say much (or even know all of it) about what’s in the pipeline, but the engine team always works on this kind of topics.
I’m aware of a specific AMD processor (and a second to confirm) where Cubendo cannot distribute the load as evenly as on other processors, and that’s being tracked.
If instead of hardware-related, you meant project-related distribution: if there are practical examples of the same project, all things being equal, where Cubase shows issues and other DAWs don’t, I’m personally always interested. If and when you have time, also feel free to PM.

Thank you!!!

That would be awesome, and much appreciated!