Disappointed with multi-processor usage on new i9 18-core Intel PC

I just finished a brand new custom PC build for Nuendo based around an 18-core i9 Intel CPU (9980XE) running Windows 10 Pro with 64 GB of memory. The system uses all NVMe M.2 SSDs for the OS and data drives, exclusively for Nuendo 10.1 and it’s (reportedly) improved multi-processor operation.

However, I loaded up an episode of a short film I finished recently in Nuendo 8.3.20 that was taxing my old i7 2600K at 100% (to the point where I had to disable several tracks in chunks in order to finish it up). Nothing crazy going on; just a lot of live plug-ins being used.

The first thing I noticed was that this project was now sitting right at 50% CPU usage on the performance meter with all tracks enabled. It was an improvement, to be sure, but not quite what I was expecting. A month ago, I tested this same episode on another Nuendo rig using a 6-core 3rd generation Intel CPU, and it, too, was sitting at 50% on that machine.

Hmmm.

I have optimized the machine as much as I can, including the removal of any speed-step settings in the BIOS, etc.

So, my question - can anyone from Steinberg speak to how the multi-processor architecture is set up for Nuendo? Because it seems to me there is a lot of room for improvement here. Unless there is something I’m missing with the configuration of this new CPU for Nuendo 10 (?).

Thank you,

  • Rodney

Audio is pretty hard to process in parallel with a high level of efficiency. For your new processor it will be the clock speed and the technology used in it that makes a difference.
I would have advices against a processor like that.
The important thing for audio applications is single core speed and efficiency of the processor.

Audio is pretty hard to process in parallel with a high level of efficiency. For your new processor it will be the clock speed and the technology used in it that makes a difference.
I would have advices against a processor with a high number of cores and focused on clock speed.
The important thing for audio applications is single core speed and efficiency of the processor.

Maybe you could try to multiply the tracks and see how it scales. Suppose you’re seeing 50% and multiply track count by three, does it play?

I’m basically thinking that the load isn’t necessarily balanced the way we’re think when adding cores. If the heaviest per core load gives you 50% load on the most taxed core then that’s one thing, and total processing capacity should be higher than our seems…

???

Just saw Erik’s reply. I think we’re saying similar things. I will say that scaling seems very dependent on architecture. AMD did quite well in scaling actually.

I was expecting (hoping) for a better core distribution on a busy project like this with lots of plug-ins, happening automatically “under the hood” with a larger processor. I’ve read in some articles/forum threads that some of the multi-processing is set up specifically for FX channels, but maybe not for Group channels (?). I’m curious to hear about details like that within Nuendo’s architecture so that perhaps I can construct the project differently to utilize the processing power better.

Friends of mine who work in television talk about their “super sessions” with up to 24 episodes of a show’s season living in a single Pro Tools session. I don’t see how that would be possible with Nuendo if I’m already sitting at 50% CPU usage with a single 5-minute episode.

But try to max out your system though.

Multiple episodes will be after one another in a project. That will require a lot of ram but really won’t affect processing that much.
What are the actual specs of your machine?

Just to “visualise” why parallel processing for audio is difficult.

Your audio plays back on a track
Some dynamics on the track
Sends are sending signal to a reverb
Signal is sent to a group
Group has some processing
Group has a send to another reverb
Group is sent to a stem.
The reverbs are also summed to the stem
Stem is sent to a main mix
The main mix has a send to a stereo downmix

All this has to be processed in series and in order, you can’t really devide it and share the load here as every audio buffer needs to be computed in series.
So what CAN be processed in parallel is things like more playback channels with identical processing. Or several identical vstis but series processing of sound is series processing.

And you can’t mix a compressed signal until it has been processed and everything has of course to be in sync all the time regardless of what happens in the chain that can be both series but also parallel with another track with different processing where the delay through the process is longer or shorter than in the first channel.
It’s pretty darn complex to be able to effectively split processing on digfeeent processors while keeping everything in sync.

I will politely disagree. The scenario you describe is not all inclusive but only describes CPU performance in some, but not all, senarios. Is the OP using a lower number of high CPU demand plugins or a larger number of low demand plugins? How many buses and how many discrete I/O channels is he running? Is his CPU load primarily comprised of audio plugins or virtual instruments? What is his typical latency setting, which would be huge here? Hardware factors could also have an impact depending upon how their interrupts are running.

In general, you are correct about the challenges of distributing real time audio across multiple threads but an 18 core Intel processor in the right setting will kick the crap out of the Intel 6 core he’s referring to that he tested with equal performance, regardless of clock speed or IPC efficiency.

And as far as Intel vs AMD. AMD does very well in some settings but the architecture Intel currently uses substantially outperforms AMD in real-time low-latency throughput. Which is ironic since it used to be the other way around, back in the day.

Can the OP post an image of your Windows CPU performance meter showing all 18 cores, which will be 36 threads, while running the project you’re talking about? And do you have the ability to monitor your CPU temperature accurately? If your CPU cooling is not performing as it should it will throttle the processor when under load and hamstring performance. On a brand new build that can be the case. Not something people normally suspect but I have seen this before in person.

The last thing I would suggest is performing a standard torture test on your system that has nothing to do with Nuendo. A standard CPU performance test that loads the processor to 100% and evaluates its performance. Compare that to other systems with your same processor and see if all is well.

Just Google “CPU benchmark” and you should find everything you need. I would not jump to the conclusion that you wasted your money on 18 cores. I suspect there’s more to the picture than meets the eye here. I know I would be really bummed if I dropped the money you dropped on that system without it getting better results. I’m pretty confident there is a path to better results on your new system.

P.S. if you’re not using the highest performing liquid cooling you can afford to buy and overclocking, I would recommend doing so. EricG is correct that clock speed matters and that is the primary concession required to have a high core count. High performance Cooling with appropriate overclocking can offset that to a large degree.

Done correctly, it it can be 100% stable.

I was just trying to make the point that multiple cores is not of much use if the power to run what HAS to be processed in a serial fashion just isn’t there.

I know it’s not actually a perfect representation of how this all works.
But say that the OP has a 18core serverprocessor with 2.2GHz clock speed and isn’t over clocking.
That would probably not be a great choice of processor for general daw use. Unless he does only external summing and no mixing in the daw. But the same system might work well as a “slave” running dozens of Kontakt instances in parallel.
It all depends on use.

no, that CPU runs by default at 3.0 GHz with maximum turbo at 4.4 GHz (all cores). You can overclock it to go higher than that, that is correct.