Higher FFT Sizes Equals More Sluggish SL11 Performance

ctreitzell · August 21, 2024, 5:47pm

I’ve mostly been working at 3072 smp and below

I noticed today, significantly more sluggish performance the higher I set FFT size. I didn’t see another post about this. I didn’t find any posts about this when I searched.

My machine is 6 years old (i7 8700K/ 64GB/ GeForce 1050 Ti), so getting a bit long in the tooth.

I did update to 11.0.20 today, worked all day and then experienced sluggish performance
-long zoom in/out times
-long wait times for mutes and solos
-etc

So, I rolled back to 11.0.10 and find pretty much the same performance.

As I long as I keep FFT size below 4000, I get normal, quick performance for mouse scroll wheel zooming and everything else. Set it to over 3072 and it’s finger tappin’ time.

My current job has 27 layers…I made a few more today and maxed my machine…so pared it back. I can work comfortably at FFT in the 2000s.

st10ss · August 21, 2024, 5:53pm

FFT calculation is time-dependent, more points need more time to get calculated, not because it’s more to calculate, just because it uses other time frames related to the measurement points.
More accuracy in the lower frequency area leads to high calculation times.
Parallel processing can reduce the times.

Joey_Kapish · August 22, 2024, 2:58pm

I’ve made numerous posts about this(at least 10). That is odd that you didn’t find any (unless the moderators/administrators secretly shadowbanned me).

To add to what @st10ss mentioned, I believe the main issue here is that there’s also overlapping going on (which can be cpu/gpu intensive). The scenario of overlapping can be compared to the idea of AMD’s fidelity Super Resolution(FSR) where in order to scale a lower quality image up to a higher quality image they use algorithms that are optimized(keyword OPTIMIZED) to take a lower quality image and upscale it to a higher quality image(like 4K) to give off the illusion that you’re getting the same quality at a lower resolution. AMD claims (Nvidia claims this too) that “gpu acceleration is only possibly if you use our hardware and only our hardware only”(implying that gpu acceleration is only possible through hardware like gpu’s) but I believe that is a lie in order to fool consumers into buying their gpu’s. I believe AMD and Nvidia are using deep-learning and machine-learning and implementing that technology into their GPU’s (they both hint about it but never explicitly say it in their marketing) and given off the illusion that upsampling/upscaling is only possible through hardware acceleration. Nvidia does the same thing where they claim that their “CUDA technology is only possible through their hardware because it is entirely dependent on hardware” and I dont believe that, I believe that they’re just using that as a marketing ploy in order to fool people into buying their GPU’s.

However the main problem here is OPTIMIZATION. Spectralayers needs a overhaul of OPTIMIZATION done in order to make it more efficient. From the transformation process being optimized to be done in real-time, to selections being optimized(where you can have over 5000 selections within the same project and Spectralayers doesn’t become sluggish at all), to higher FFT sizes and resolutions and refinements, to unmixing levels being optimized to preview in real-time.

The only thing that I’m worried about though is that the development cycle of Spectralayers (these every once-a-year releases) are so slow that even if Spectralayers does get optimized it will become obsolete because everybody has already moved on. I’m noticing that a lot of people are buying these new ARM devices (I’m actually seeing people use them in my personal life because it’s more efficient and less expensive than Intel and AMD counterparts) and I’m afraid that an incredible amount of resources will go into optimization for intel/AMD chips (or maybe even for CUDA) only for it to be futile because no one uses those chips anymore… So this is a hint for the developer to start straying away from CUDA because I’ve encountered people(like college students) who say “why would I buy a Intel laptop for $1500 when I could buy a ARM laptop (that is just as capable as Intel) for $700”?

st10ss · August 22, 2024, 7:37pm

You don’t understand the problem.

take3 · August 22, 2024, 7:54pm

A higher FFT size does indeed mean more calculations, which leads to increased CPU time needed to calculate it. Having overlap increases the amount of calculations needed as well, depending on the amount of overlap. With this in mind it’s not surprising that a lower FFT size will give better performance, and it’s a good find to point out.

ctreitzell · August 23, 2024, 7:18am

Hey, I don’t know “the why”…I just was noting about “the what”

Thanks for the explanations…I didn’t know there was a difference in performance for the these FFT settings for simply navigating the interface.

Ben_H · January 23, 2025, 5:59am

I’ve also noticed this issue, where 2048 FFT is fine, but 4096 and above is painfully slow, even simply zooming, scrolling, or toggling layers. I have a reasonable PC (12-core Ryzen5900x, 32GB), however working 4096 FFT size is pratically unworkable. Now I’m aware of the increased computation load of larger FFTs (having worked my whole professional life as a DSP engineer), however, increased FFT-frame size should imply larger frames, and hence fewer overlaps, and fewer FFT’s overall. From experience working with STFT audio processing, increased FFT size usually only incurs a marginal overall CPU load. The bigger challenge working with larger FFT-size has always been increased latency, and smearing.

So I too am perplexed that there is such a MASSIVE performance hit simply going from 2048 to 4096. It would be great if the dev team could at least acknowledge this issue, and provide some explanation as to probable cause. My professional opinion - most likely an implementation bug in the code.

Sunnyman · March 1, 2025, 11:07am

O(n log(n))…

Yepp, looks like the code/data-handling could be improved somewhat.

Best example is here:

Joey_Kapish · March 2, 2025, 1:20pm

I’m going to have to do something I’ve never done which is backtrack. I agree and believe there is a lot of legacy code with spectralayers but I don’t believe the solution is better code (in the sense of completely re-writing certain code). I believe the solution here is A.I. acceleration and using A.I. to accelerate legacy code to run 10 (or even 100X more) times efficient on processors.

AMD recently introduced “frame generation”, and what all these technologies(FSR and upscaling) have in common is they are all using A.I. (so that the developers are left to develop and gamers can game). Because of the new AMD chip I am convinced that AMD is using some type of A.I. hardware acceleration (they’re just being secretive about it and not disclosing it) on these chips because there is no way a apu can achieve these sort of results without A.I. acceleration involved.

So, I believe the answer is A.I. acceleration.

st10ss · March 2, 2025, 1:41pm

Again, I believe you don’t understand the problem.
FFT needs a given time to calculate. More points, more precision, but also more time to calculate.

Thor.HOG · March 2, 2025, 2:48pm

Agreed. This forum in particular seems to have a high number of user-based inconsistencies and potential deficiencies in conceptual understanding attributed directly, and speciously, to “SB’s code.” Even the gentleman who reportedly spent his professional life as a DSP engineer presumed that higher FFT frames inferred fewer overlaps, even though FFT overlapping is process-dictated, and not an “artifact” of frame size. But I’m not a DSP engineer, so I could certainly be wrong. But I don’t think it requires a brain trust to identify that doubling the frame size isn’t “marginal” at all, and more than doubles actual CPU processing requirements.

But again, I’m not a dev writing their code, so I can’t say for sure. Who knows, maybe the “text iterations in Python” optimization example will lead to a breakthrough.

Sunnyman · March 2, 2025, 5:06pm

FFT is O(n log n), not O(n^2)

SL seems to have some inefficent algorithms or datahandling inside…, imho.

Since we don’t know about the internal programming, this remains a mystery only Steinberg can solve.

I wish for a much much better implementation of the VST-interface. The lag is not acceptable.

Joey_Kapish · March 2, 2025, 9:39pm

Actually the both of you are wrong and misinformed (or just far behind in how far A.I. has advanced).

I’ll explain in a simple concept. You can use A.I. to literally take a simple celeron processor and accelerate it to outdo a supercomputer. That’s how far A.I. has advanced.

st10ss · March 3, 2025, 10:04am

Making statements doesn’t explain anything.

ctreitzell · March 3, 2025, 12:20pm

28+ minutes!!! Oh my, if I can find the time…I got 10 min now…

Ben_H · March 11, 2025, 4:41am

FFT overlapping should not be “process-dictated”.
Audio STFT processing using overlapped FFT frames (phase vocoder) typically specifies a fixed percentage overlap between adjacent frames (50 or 75% for example), and not an absolute number of new samples per frame. A higher percentage (fewer new samples per frame overlap) generally gives audibly better results with fewer spectral artefacts, at the cost of higher CPU load. A window function is also optionally chosen based on this percentage to satisfy the Princen-Bradley constant-overlap-add constraint for perfect reconstruction (e.g. root-Hann, or Kaiser-Bessel Derived (KBD)).

Doubling the FFT-size implies doubling the number of new samples per FFT, halving the number of FFTs performed in total.
The CPU-load increase (order N.log(N)) is then offset by 50% fewer FFTs, resulting in only a marginal CPU-load increase.

I may well be completely wrong, and SLP11 could be employing a fixed number of samples per FFT frame overlap, hence a CPU-load that scales exponentially. I would be very surprised if this were the case. Either way, it would still not explain the extreme change from 2048 to 4096, which is anecdotally about 10 to 20 times slower in my experience!

Thor.HOG · March 11, 2025, 11:39am

Not sure I understand this - you’re saying overlapping shouldn’t be “process dictated,” but then say “the process typically specifies a fixed percentage.” Just so as we’re not getting snagged on definitions, “process dictated” and “process specified” mean the same thing (to me). Put differently, overlap is a defined process variable (even if most implementations specify 50/75/n). From an algorithmic perspective, one could use a 16k frame size and still process FFTs with a 75% overlap if stipulated. That’s all I was saying - the use of “infer” indicated (to me) that you were saying “overlap = frames / 2” from a process/algorithm perspective. If you 're saying that a logical use case for increasing frames means that typically one would reduce the overlap percentage, then that’s different; however, it still doesn’t change the fact that 4096 frames is still twice 2048, and that CPU power is more than doubled (generally) to process, and that the overlapping is a distinct, separate calculation.

I just don’t think this is true. It doesn’t “imply” that at all. But you’re the DSP engineer, not me, so if you are saying “in the real world this is exactly what is always done whether the user wants it or not” then I’ll have to take your word for it. “Deep research” from AI actually suggests that if you double the frame rate, you may have to INCREASE the overlap to maintain a “smooth analysis” (its words, not mine). So I don’t believe “overlap = frames / 2” is a postulate, or even a given. SB will have to tell you what their code does.

This was the first time I saw any quantification, though I still don’t know WHERE you see this manifesting itself, and under what circumstances. Without seeing exactly what you are doing and what processing you are applying, it’s impossible to tell. It could just be you need a supported GPU to better support your workflow if you are indeed going to work with larger frames. But all other things being equal, when you double frame size, I disagree that one should consider that a marginal increase of CPU, particularly when you’re processing twice the frames. But the CODE doesn’t change between frame size changes, so if you’re happy with the performance of 2048 frame sizes but not 4096 frame sizes, then l think it’s logical to take that as empirical evidence that the issue is the amount of data your working with and not what I consider the “knee jerk” reaction to “SB’s code sucks.” And to be clear, that’s the only reason I pointed any of this out, because it’s not factual.

Ben_H · March 14, 2025, 1:30pm

A fixed percentage “of the FFT-size”. 50% of 1024 FFT is 512 samples per update. 50% of 4096 is 2048 samples. This is not dictated by any software “process”, such as a periodic software interrupt (such as ASIO audio buffer rate), but is dictated by a fixed percentage of the FFT-size in consecutive frames of accumulated samples.

It’s certainly true that smaller sample-frames per overlap result in better reconstructed audio quality, however this comes with a major caveat: real-time audio DSP has to run in real time on a given CPU. There’s a practical lower-limit to how small you can make the FFT-update frame-size before running out of CPU horsepower. To your point, it’s feasible they’ve engineered SLP11 to use fixed small sample-frames for the sake of audio quality, regardless of FFT-size. We can only speculate.

Look, I’m not interested in convincing you of anything - I’ve simply added my observations and professional opinion to the consensus that there’s a VERY obvious issue, and the numbers simply don’t stack up going beyond 2048. A doubling in size (to 4096), according to O(N.log(N)), should be roughly 220%. That’s simply NOT what I’m seeing. Whether it’s specific to the FFT, or more broadly the application itself, who knows.

And lastly, for you to insinuate that I ever said “SB’s code sucks” is being deliberately disingenuous of you. I made the assertion that the observed behaviour suggested likely bugs in the code, which - if you’ve EVER written any DSP code in your life, instead of just playing keyboard warrior - is a perfectly reasonable conclusion, and not an accusation that the code “sucks”. I’m pretty sure the coders would not be as triggered by the suggestion of possible bugs in the code! Writing an application this sophisticated, and mathematically complex requires some serious DSP skill, so to get it right without ANY implementation-bugs would be impossible.

Thor.HOG · March 14, 2025, 1:59pm

It seems you’re conflating the number of overlapping frames processed after-the-fact with the number of samples within the frames. And not being rude - really, I sincerely mean that - but I think this has run its course in regard to any user being able to have further intelligent conversations in the absence of code references. Meaning, while interesting theory to continue discussing, it doesn’t matter in this context since we simply don’t know (and will never know) what WL’s code is.

I apologize for making it seem that I was quoting YOU in that post. I take full responsibility for that - it was not my intention at all, and I was quoting a different user who I have since “ignored” so I can’t/couldn’t go back and quote them again. I would be upset if someone did that to me, and I’m sorry. Please take this as my official apology for making it look like you said that. You did not, and I was referring to a different user. For what it’s worth, I will never, ever, under any circumstances, be “deliberately disingenuous” with you are any other human on the face of the planet. Ever. So, sorry again.

Regarding “keyboard-warrior,” while I think that was an unnecessary insult, I’ll attribute that to the aforementioned misunderstanding and consider it a reasonable, if inaccurate, response. That said, I already indicated that I do not write production, commercial DSP code. I contributed work in academia and continue to work with FFT code in Max MSP for use in patches I write in Ableton (for myself) but that’s the extent of it. None of that changes the algorithmic requirements and dependences for FFT processing though, and I consider my previous statements accurate.

Replying back to you after your reply in the middle of posts to the other user have really made this veer off into a different subject, particularly when we’re now attributing processing time increases with corresponding parameter increases to “bugs.”

What I will do is write a patch to explicitly test performance in this regard, though it will certainly be platform dependent and not exactly a globally applicable “scientific” resource. But it will be interesting.

Ben_H · March 14, 2025, 2:04pm

Thank you for the honesty, and apology. That is a rare thing these days.
In retrospect, I retract calling you a keyboard warrior, as that was specifically in reaction to the “code suck’s” comment.