Apple Silicon Mac Studio - Render Performance - not optimal

I’m rendering my files on my Mac Studio Max (Apple Silicon). All native plugins, about 200. tracks.
The render speeds are very slow. For a 2 minute ambisonic experience I have to wait…2 minutes.
It is not set to ‘realtime’ but it is taking as long as the timeline.
Checking the activity monitor I see that my CPU is far from being maxed out.
Any hints at how this can be made to go a bit faster?



Latest version of Nuendo 12

Yeah, I posted the same issue some time ago.
In the end we came to the conclusion, that it is good to have time for a coffee or something in between sessions. :neutral_face: :man_shrugging:

Yeah i agree: but it is kinda weird that such a fast system is unused… i’ll test ProTools this week to see if the same issue is there?

2 Likes

I was about to post about export times as well. I have a 3hr weekly show that I produce that needs to be exported into multiple parts using cycle markers. I started doing most of the denoising and cleanup work externally in RX which exports blazing fast.

But, for the final master, with very few plugins (FF Pro-L, FF Pro-MB, UADx Neve 1073) it takes about 40mins+ to export. :pensive: I’ve heard that UAD plugins lead to longer export times, but even during regular VoiceOver work where I only use some Acon plugins, export is painfully slow, as mentioned above almost close to realtime!

For these tests - are you running in native or Rosetta2 mode?

Overall Rosetta only adds a relatively small overhead all things considered, but it could be affecting certain things more than others.

For some Reason on the MacStudio Nuendo still defaults to Rosetta2. You need to jump through one manual step to make it run native M1. All depends on your plugins. If you’re mostly native it should be fine.

i’m running it natively and i’m not testing… i’m just showing that the performance is average at best.
rosettta would not run plugins natively btw.

I would quit all other running applications.

@noeqplease I’m not a programmer and do not know enough about CPUs multithreading, but if I look at my activity monitor screenshot above: there is 1 actual applications running: Firefox, which takes almost no CPU and memory. The other things are background drivers and a cloud service, but again they hardly take any hit on the processor and memory.
I will try it again with it shut down, but how much could it help? I see a CPU load of 22 % (user) and 7% (system) and most of that is N12.
The only reason I can imagine is that FF or a background service prohibits N12 from using more CPU power… which is still weird because the system is under minimal load (30%).

Again I will run some tests (today) but this just feels like something else is going on.

I see Nuendo, Firefox, Avid Link, Quicktime, Stream Deck, EuControl, and there is possible more below.
I am from the “old” school of running only the DAW and what it needs, period. The memory cache is nearly full too, that might be part of your problems. Meaning several apps are likely storing stuff in RAM, which don’t need to exist.

So yes, try a reboot, and also quit any unnecessary background applications. Quit Avid Link, and that Stream Deck thing… s that needed in Nuendo? And Quicktime, if you are not running a video inside Nuendo, it does not need to be running, it’s a memory hog. Oh also,turn off Spotlight on your audio drives, that one is a bastard. It can cause all sorts of issues if it is trying to index an audio drive you are using. It’s easy to do via the Spotlight settings,just add any hard drives used only for audio, in the exclusion list.

Cheers, let us know how it goes.

Hey thanks I feel a little bit ashamed that I didn’t think to reboot :slight_smile:
Streamdeck is a macro app that is interfacing with a Streamdeck 15 button keyboard.
I don’t need it to render stuff so I took that out (as well as any other thing that is unnecessary).
The result is marginal if not the same.
I rendered a 2:24s cycle marker that lasted 1:43s.
Here are some screenshots:
Cpu and memory after a fresh boot:

Cpu and memory during the bounce/render:

I don’t see a lot of difference in terms of usage with earlier.

On the topic of cache… I don’t see why it is a problem. IIRC swap is an issue… I had that on my MacMini intel 2018 with 8GB ram.

At the moment I’m rendering toward internal disk and that’s also where all the data is.
Also all plugin windows are open. Next post will be a test with those and on a seperate SSD.

UPDATE: that test did not change a thing and was more or less the same duration and performance.

1 Like

I think the problem here is that Nuendo doesn’t use all available cpu cores. Only two I think. And if one of these cores has to work hard processing something like Acon DeVerberate then the export will run in almost real time :face_with_diagonal_mouth:

why would it only be using 2 cores? it is set to multi-processing…
“Multi processing distributes the processing load evenly to all available CPUs, allowing Nuendo to make full use of the combined power of the multiple processors.
Multi processing is activated by default. You can find the setting in the Studio Setup dialog (Audio System page).”

I can uncheck it and see but it would be really weird if it has a setting for multi core support and then does not use it…
just checked and it shuts down Asio guard so not comparable.

An option to allow multi-processing doesn’t by itself distribute a process across all available cores.

There are three ways multiple cores can be used:

  1. At the operating system level, the operating system itself and all the launched applications could use different cores so they’re not competing for one while leaving others idle. The OS will do this on its own. So having Firefox run at the same time isn’t that bad, as the OS likely keeps it on a different core than Nuendo. Unless the system is totally maxed out. But I have yet to max out my Mac Studio, it’s not that easy.

  2. Large applications like Nuendo often spawn multiple background processes to do certain tasks. These are literally separate processes and can easily be put on a different core by the operating system. That’s what background renders, and all kinds of things like wave form generation fall into.

  3. The main process (or any background process) can decide to start multiple threads for a certain task. There’s no limit how many threads a process can spawn and give work to do in parallel, with each of those threads running on different cores.

The first caveat is that this is on a per task basis. So each task (a plugin algorithm like reverb), the main render engine that coordinates a render, and individual track render that processes all the automation for volume, pan, etc. are all tasks. Just because you allow the application to multi-process, doesn’t automatically make each task multi-threaded. A software developer actually has to think about how to best do it and write the code for it. As some applications and plugin go way back, that may not have been done for all. So only some tasks will take advantage of multi-threading, and there is no way to force them, unless the developer sets it up that in the first place.

The second caveat is that not all tasks can benefit from multi-threading, as they may have steps that depend on each other. Or the overhead of managing multiple threads and re-combine results may actually be bigger than any potential benefit. In an audio render for example, you need to process plugin A before you can process plugin B which comes after it in the signal chain. Can’t process them in parallel. But you can render track 1 and track 2 separately as you process the main bus, because they don’t share any common data.

There’s much more to that, going into which won’t help the discussion. The main take away is that unless an application lends itself to massive multi-threading by its nature, and the programmers went through the effort coding it for that, it won’t happen just because you said it’s ok by click ‘enable multi-processing’. Audio applications are actually harder to multi-thread than video applications which of can render each pixel independently.

Also there will be a big difference between some plugin suppliers and others, both in how well they’re done, and how much they’ve been optimized. By their nature some plugins are heavier than others. If you have a few heavy, low optimization plugins in your signal chain, your renders will most likely be very sluggish, regardless of core count.

One thing to watch out on the MacStudio (and also newer Intel CPUs) is that not all cores are created equal. The M1/2 chips now have efficiency cores and high performance cores. That is meant to allow your system to run on less power and cooler when it’s not busy doing a heavy workload. That means however that the OS and the application needs to understand that and schedule cores accordingly, or you could end up with an important task at a lower performance core. I assume that Nuendo took care of that as part of their M1 port. But then one should never assume.

(I was a software engineer in a former career)

5 Likes

@allklier thanks for these insights!

So is it fair to say that it might be 1 (or more) plugin that may stall high render speeds? Would that explain why the cpu is not completely maxed out? Rx10 is slamming my mac when I dehum but is superfast, I had hoped to see the same performance (wellnot exactl, since dehum is a single task) on Nuendo.

I knew about the efficiency cores etc. but my Mac Studio is running really cool, with no ventilation noise, next to me during a render. so that had me wondering about these cpu performance numbers.

On the topic of multithreading, I can remember my first few computers that where single core cpu’s (with a Turbo Boost button!!).
Not sure if Nuendo was around then (1989), but I would have hoped all possible processes would be multithread compatible by now.
But maybe Apple Silicon is a different beast and has ‘other’ multithreading dependencies?

Anyway, I will try disabling some plugins in a backup version of my project to see what it does for render times.

Yes, a single plugin can hold the show up. Eliminating them one by one would show which. Or just take all of them out at once and see if the issue is the core Nuendo app vs. your plugins before you go on a hunt.

I will say the MacStudio is a very powerful machine, and it may not be reasonable to expect it to be maxed out on CPU on an audio render, as much older systems may have been. Overall audio render tasks are much lighter than other modern loads such as Video post, CG, and AI. And audio renders don’t all lend themselves to heavy multi-threading.

That said, your render times may seem below where they could be, and it may be worthwhile to chase it down. But I suggest not to use CPU utilization as a key metric, but more just look at what’s going on in your project and isolate. Also look in odd corners like control room inserts, etc.

And I’ve found that in Nuendo you actually have to ‘power off’ a plugin in order to take it out of process. If you just bypass it, it will slow down and take CPU cycles. Nuendo has a performance meter you can add to the bottom of the screen which can help see how Nuendo feels about it’s processing load.

2 Likes

In my experience setting the buffer size to the max value increases render performance. No idea why this is, but that’s how things work on my machine.

2 Likes

That’s the thing, I don’t think this has anything to do with plugins or background apps. RX10 can run my same heavy denoising chain with plugins like Deverberate, VoiceGate etc. super fast. Even max buffer size didn’t make much of a difference for me.

I hope Steinberg can look into this!

You’re referring to running them inside the RX10 editor app as a setup, rather than as plugins inside a DAW? That’s a different scenario. They may have more options there than as a VST plugin to optimize that (direct access to files, full control over flow - rather than a restrictive plugin API).

A better comparison might be to run your signal chain in Nunedo, and then also in ProTools or some other DAW. That would be the most meaningful side-by-side if you think that Nuendo itself is to blame for the slow renders.

Plugin architectures are convenient, but come with a lot of compromises. Also iZotope doesn’t have the best reputation for keeping their plugins optimized and up to date (again, that’s different from the standalone app). They broke RxConnect again with RX10 and despite a service release and knowing it’s a bug, didn’t fix it. There are more stories like that.

Yes, I am running this inside RX10 for batch processing. I also run Nuendo in Rosetta as I have some plugins which don’t run natively.

RX10 is running AU instead of VST. I haven’t scanned my VST plugins yet.

I don’t have Protools, but can test the plugin chain in Logic, but then again Logic has always run much more CPU efficient and faster. Only Reaper has come close to its performance.

I agree, RXConnect has been broken since RX8 or 9 I believe. Very unreliable. Luckily, I use very little RX for dialogue edit these days, but I do use it as a batch processor for other stuff currently.