Slow performance of Nuendo under Windows (since many years...)

Hi, there,
Windows and Steinberg doesn’t like each other very well. This already since long time…
I’m running in the performance slow down over time issue every day. And today I found a new issue in terms of performance:
The work for Nuendo is just to glue several (short) wave tracks together.
See in below picture.


Several tracks (around 70) are grouped in several folders below the selected one (red frames).
(time intervals / connection points are all at the same timing in below sub-folders)
As you can see in below screen shot, I’ve a high-performance machine, with a lot of physical and logical cores.

Nevertheless, N14 is using mainly 1 core, and the others are sleeping.
Whatever settings I use (hyperthreading, etc.), I won’t get a higher performance.
Of course, I don’t know if the SW structure inside Nuendo does allow a processor split for this task, but from basic multitasking point of view, it should be possible, I think.
Does anyone know a possible solution, how to assign more cores to Nuendo? I’m working on Win11 (but I should have invested that money in a MAC…)

BTW: If I select the glue position step by step (from left to the right), then I would be much faster than the work of this Windows-Computer…

Yeah, thats truly an issue. Have it as well, regardless which version of Nuendo. Even in my fastest workstation (Ryzen 5950X, 128gb ram, nvme ssds etc).
Nuendo just slows down during extensive sample editing in my case.

1 Like

Though not scientific by any means, when I was doing my own tests with DAW performance, I found that track threading was more of a concern than which DAW used performance cores vs efficiency cores. While p:e performance is certainly a consideration, I found that CPU usage in Logic, Reason, Cubase, and Reaper* all shared similar behavior when processing significant serial audio chains on a single track. I don’t think it’s a postulate necessarily, but what came up in research was a general “one-thread-per-track” paradigm necessitated by the nature of serial audio processing. Meaning, the nature of the process’ fidelity requirements dictated the use of a single thread. So even with powerful CPUs, proper “processing ettiequte” was still required for individual tracks.

My takeaway was that just having a beast of a machine didn’t give me unlimited license to behave badly when engineering. I still do it, of course, but I at least know what to expect.

Just thought I’d share that - there may be a processing workflow to optimize CPU usage for you regardless of what DAW you’re using.

*Footnote: I didn’t go out of my way to thoroughly test Reaper. I own it, but I don’t really like it, and I knew I wasn’t going to use it real life. As soon as I saw similar threading results with my made-up test I quit.

1 Like

That’s a fairly good view point. I also was thinking about it so I didn’t research on this performance topic itself, as I’m too busy with other issues, besides looking for work around solutions, for many issues that are visible in a DAW but in fact, related to Windows…
In terms of sequentially audio processing, I do agree for a single track, even so, a clever routine might handle it with different threads.
But the parallel tracks can be definitely handled by different threads…
I’m not at the system now, but a simple comparison does make me wonder about the SW structure of Nuendo:
If you click 10 cuts manually for gluing, it takes 1 second per each click. All 70 tracks are glued correctly inside each track (sequential per track, parallel(?) for all 70 tracks)
If the 10 positions are selected together (as shown in my picture with more than 10 positions) than it doesn’t take 10 seconds (I would expect it much faster due to the missing manual mouse click) but it takes about 1 minute. So why does the automated process take more time than the manual workflow?
Therefore, I assume, that the complex SW structure inside Nuendo, needs further optimizations in terms of CPU processing, besides the typical issues related to windows.
BTW: I haven’t measured the time exactly, so it is roughly as I was comparing about 30 clicks (= 30 seconds of project time) between 30 manual clicks and 1 range selected click… I didn’t stop the time I was waiting until the gluing was finished, but I could have picked-up a cup of coffee instead of waiting to finish … :smiling_face_with_sunglasses:

1 Like

Definitely, and are. My understanding is that the “single thread per track” model is an “industry standard” by-design adoption to ensure all real-time serial audio processing doesn’t fork threads and cause other issues. Maybe as DSP engineer can chime in on the accuracy of that generalization.

I just a new rig and will be testing all this again. This reminds me that I need to get up with @Norbury_Brook to get the infos on his benchmark testing :slight_smile:

Thanks.

1 Like

@Lucky7 you’d find the same behavior on Mac OS.

Your scenario shows that in this case single core speed is better than core count.

Your single core speed is low..3.0 GHZ.

My 9950x for example is close to twice that.

M

All in all, it isn’t that simple in reality.
If you edit clips, there is some disk (or SSD) I/O involved too.

And every CPU generation needs new optimizations. The amount of available CPU flavours isn’t that helpful for the developers, sometimes.

I could understand the slowdown, if I would be editing the audiofiles on a harddisk, which would be way slower than an ssd, but not on a extremely fast nvme ssd. Not even on a sata ssd.
During editing there are no fx what so ever. Nuendo just slows down after a while.
And thats bad. Really bad.

I find, Steinberg really should focus on improving that for now instead of adding new features.
Well, that and bugfixing of course.

What would help is some empirical data for users to examine and test. Saying “it just slows down after a while and that’s really bad” is rather subjective - more importantly, we have no metric by which to quantify impact. I don’t think anyone on the dev team could look at the “it slows down” statement and be expected to “improve that.”

Do you have system resource benchmarks you can cite with detailed use-case/reproduction steps that result in this behavior?

1 Like

I think we both are on the same page but

might be necessary for real time processing, to ensure no audio drop outs or pop-noises.
Nevertheless, gluing cut tracks together is an offline process (or better not a real-time process), and is equivalent (just using different words) to copy 2 different string variables into one new string variable.
And even with your argument, it doesn’t explain while manually gluing is faster then gluing a selected range…:face_with_monocle:

I was building my machine around 3-4 years ago. At that time, the common understanding for Nuendo (including Steinberg) was, that more cores are better than too few. Nevertheless, I “shot over the fence” and should have selected a later CPU with higher clock f, instead of selecting too much cores, which are now only sleeping.

One learns by experience

Of course there is the main impact from OS side, which provides the base for core selection to the applications. On the other hand, there might be ways to improve such handling by structuring the SW in a better way.

I know that Steinberg is fully aware of such situation but they are struggling with the OS empires, too.:unamused_face:

It makes no difference if you select the gluing manually or by selected area…
This is the key-point of my claim.

As I wrote above, the Engineers at Steinberg are fully aware of that, but so, they are depending on the OS empires…
It is just a personal opinion, because I’m neither an expert in the SW structure of Nuendo, nor in OS programming:
The main problem is the garbage collection and unnecessary background tasks, that the OS’s are doing (and are impossible to block by a standard user like me) when they are disturbing your real time efficiency.
An example is the background SW update and re-writing registry entries several times with the same content. Also every CPU is working slightly different, so a general optimization is definitely a big challenge.
The only solution so far, in terms of PC slow down (instead of using Nuendo slowdown), is to re-boot the system at least after 3-4 hours or after every memory excessive workload tasks…
Another example from my system:
Switch on the machine in the morning (9am), start Nuendo, load a project, and leave the system alone (don’t do anything). At 3 pm return to the system and use Nuendo. I had often enough the experience, that the whole system performance was slowed down. After a re-boot, the system / Nuendo was working fine again. Therefore, I’m not sure if there can be a solution from Steinberg to optimize such system related behaviour (slowing down over time) but they are talking with the OS guys to find basic technical solutions.
Most important issue for any DAW is the real time audio processing, to avoid pop-noises or audio drop outs. Therefore, I hope that the OS empires will give the possibility to such SW developers, to more efficiently use and select system performance parameters. (And not decided by OS, here that’s what you get, take it or leave it!)
Based on this comment of mine, I didn’t intend to make out a fuss of “slow down in performance” and focused on the track gluing issue, which should be a simple action, as it is just handling digital data.
And again, there the key point is “Why is manually handling faster than handling of a selected range”?:face_with_monocle:

It wasn’t an “argument” per se, it was an identification of how both research and observation identify the general approach of using single-thread processing per track, and that there may be some workflow improvements one could apply to the degree you’re seeing this yourself (which you are). I have no idea why “manual gluing” is faster than a “selected range.” I could hazard a guess, but it wouldn’t really matter. For me, I don’t churn too much on intra-process comparisons because there’s generally nothing I can do about it. I focus on getting the best overall OS for my needs with the best hardware I can to support my requirements, and the rest is what it is, really.

The reason I brought that up in the first place is because your i9 is actually an 18-core, not 36. Since a DAW will generally use a single-thread per track, then you (again, generally) lose the benefit of hyper threading for pre-track per-track processing. That’s not a criticism of your hardware, just an identification that “you may need to change the way you process tracks if the current performance doesn’t suit your needs.” At some point the micro-analysis of these processes has diminishing returns (to me, anyway) unless I use it to better my workflow.

To be clear, I was replying to the other poster, not you. You actually provided some detail, and that was great. That said, even in your case the “slowing down over time” will still require some more detailed information on what process is actually slower and what system resources are in use at the time, etc.

EDIT: I meant “per-track processing,” not “pre-track.” I figured the difference was worth the edit.

1 Like