WaveLab 11.1 does not use all resources available on Apple Silicon

PG1 · June 19, 2022, 12:20pm

Again, this all depends on the DSP algorithms being used. Certain DSPs are faster in Apple Silicon, some others in Rosetta.
WaveLab, independently from the plugins it uses, is always faster in Apple Silicon mode (with the exception of spectrogram displays).
To give more details: the last majority of DSP processes (in mostly all plugins) are using an Intel DSP module that has matured over 20 years, with sometimes specific private algorithms. This is not the Intel CPU that is faster than the M1, but the algorithms being used and that Intel only provides for Intel CPUs. In a few years, I guess the software DSP solutions on Apple Silicon will mature.

Therefore, saying Apple Silicon beats Intel for DSP processing or the inverse, should not be generalized today.

AKolbe · June 19, 2022, 1:37pm

Ok, now I guess I get what you are saying: It is the DSP algorithm within a plug-in that is not fully optimized for Apple Silicon but still running in a “semi-translated mode”, although the plug-in shell is running in Apple Silicon-mode. This would explain the effects that I see (with my selection of plug-ins) and the fellow user doesn’t see (with his different selection of plug-ins).

I understand and appreciate that this whole issue is not the fault of WaveLab itself, but is just coming to live via WaveLab as the platform which lets users experience the plug-ins. I fully trust that you have measured the core functionality of WaveLab itself being faster in Apple Silicon-mode. Problem is just that WaveLab features heavily depend on the plug-ins, i.e. without plug-ins WaveLab wouldn’t make much sense to use, for me.

Then, it would still be beneficial if you would try to maximize the CPU resources available to plug-ins since I’m afraid that there will be lots of plug-ins in this “semi-translated mode”. As this experience shows, there are quite some very useful plug-ins (e.g. Steinberg Restoration ) that might cause users to run into resource constraints. I wouldn’t say that running 2 of these plug-ins in a process chain is an extraordinary use case?

Justin_Perkins · June 19, 2022, 2:41pm

I think a good comparison would be to test the same plugins (be sure to use VST3) in REAPER vs. WaveLab.

I say REAPER because REAPER is well regarded for its CPU efficiency and unlike most mixing DAWs, REAPER has Item FX which are similar to WaveLab’s Clip Effects in the Audio Montage.

I don’t have time to test right now but you could try to create a REAPER session and WaveLab Audio Montage with the same layout using some Item/Clip FX on each “song” in the project, and maybe a couple plugins on the Montage Output of WaveLab and Master Fader output of REAPER and then compare rendering times, and see which one can handle more real-time plugins during playback, especially heavy ones like iZotope, Soothe with Resampling etc.

AKolbe · June 19, 2022, 2:41pm

I just might add an idea:
One aspect that makes my use case of WaveLab different from others might the quality of the files that I’m working on; they are all 24bit 192 kHz Sampling rate.

If I would have the possibility to re-sample the files for play-back only to 96 kHz, the workload on the plug-ins would surely be reduced. However, as I understand the recent implementation of Re-Sampling in WaveLab, it is applied after the effects plug-ins have been working on the data. So, if there was a way to change the sample rate before the plug-ins are affected, in play back only, the CPU resource limitations (although still unchanged) wouldn’t affect my ability to listen to the Music. During final rendering of the file with the original sample rate of 192kHz , the CPU limitations only result in prolonged working time but would not affect the quality of the result.

Just as my thoughts to solutions instead of problems…

PG1 · June 19, 2022, 3:05pm

Yes, if you process audio files (not montages). In that case, insert the Resampler plugin at the top of the Master Section chain.

AKolbe · June 19, 2022, 9:15pm

I don’t use audio montages, just plain audio files. So, including the Resampler Plug-In to reduce the sample rate to 96 kHz is a suitable workaround for my play-back needs. At first glance, I was able to play back my files with the two plug-ins (Steinberg as well as iZotope) at 96 kHz and still have 20% left in the green bar.

However, I must say that the other proposed DAW (REAPER, running in Apple Silicon-mode) does indeed offer me play-back of the very same file in the original sample rate (192 kHz) with the same plug-ins and settings. So, it seems that there is some possibility for optimization in WaveLab…

Chris_995 · June 19, 2022, 10:34pm

I know what PG is saying about DSP processing and know what the OP envisions as well. I hope one day there is a solution with all DAWs that could help us fully make use of our cpu cores. For example, RX and Premier pro are making good use of cpu and gpu power. I’m on a thread ripper and love it with WL.

When processing in WL, i can use my own little time hacks for say processing a 1hr audio file, by splitting it up into x6 10min files and applying the same plugin chain, instead of waiting 15min to process the 1hr file, i can get this down to 5min by processing the x6 smal files, then rejoin them back up.

Little ways to further make use of cores. maybe even an idea so make a module in Wavelab that does something like this, but from a visionary standpoint, it would be great to explore future “different” thinking ways of better use of our cores, because they are only going to get bigger and faster and our processing technology probably needs to evolve with them so we dont bottleneck or stalemate

AKolbe · June 19, 2022, 10:52pm

Above differences in play-back performance can also be found in rendering performance differences. These are the times required for rendering a 16 min Flac file (captured Vinyl recording) of 24-bit stereo audio in 192 kHz sampling rate. The task of all DAWs was to apply 2 CPU-heavy iZotope VST3 plug-ins to reduce the Vinyl clicks. The output was written in the same format (24bit, 192kHz, Flac). The result was essentially the same file, so that there were no audio quality differences but just render time.

WaveLab needs 20min 22sec for this task; average CPU usage was 105%

Reaper needs 14min 56sec for this task (26% faster than WaveLab); average CPU usage was 110%

Although not a fair comparison to these two, since it uses a different implementation of the functionality of VST3 plug-ins, but still provides the same result:
iZotope RX Audio Editor needs 3min 42sec for this task (82% faster than WaveLab); average CPU usage was 730% for one “plug-in” and 670% for the other

It appears to me, as if there was indeed a different way, WaveLab and Reaper let the very same chain of 2 Plug-Ins operate, since the performance of WaveLab has much more ups and downs compared to the other DAW which has a very stabile CPU workload.

absolutelylasteffect · June 20, 2022, 6:04pm

In my workflow, in Mac mini m1, I can use more over sampling plugins, but I can’t use all core for the same reasons explained from Philippe. One of advantages is that I can work on multiple projects and rendering in parallel all projects. When I use multiple Audio Montage projects I can render all together.

AKolbe · June 20, 2022, 8:03pm

Thank you for sharing your experiences. It is interesting to learn your approach to render multiple Audio Montage projects together. I haven’t thought in this direction, so far.

I guess, it all comes down to my use case / workflow that comprises of pretty heavy files (24bit, 192 kHz) and pretty heavy plugins (restoration), which might not be fully optimized for Apple Silicon, yet. This combination might just be too much for first version of Apple Silicon performance and first version of WaveLab implementation on these chips.

AKolbe · June 20, 2022, 9:45pm

Some more usage of WaveLab reveals even more strange effects; of course performed on the very same file without any other application running in the background that could affect performance:

WaveLab 11.1 performs a “global analysis” in 14 sec running in Rosetta2-mode but needs 43 secs in Apple Silicon-mode. To localize errors (in Correction tab), it takes 1 Minute and 10 sec in Rosetta 2-mode but 2 Minutes and 40 sec in Apple Silicon-mode.

So, WaveLab-internal computations in Apple Silicon-mode takes double the time of Rosetta2-mode!

I have double checked this, it is not the other way around… To me, there seems to be something wrong here?!?

PG1 · June 20, 2022, 10:08pm

As I told you already, it all depends on the DSP algorithms. Some Intel proprietary algorithms are not yet matched in quality by what’s available in Apple Silicon.

Justin_Perkins · June 20, 2022, 10:14pm

I can also confirm that offline analysis of a file using Silicon mode takes roughly twice as long as Rosetta mode.

Something indeed seems off.

AKolbe · June 20, 2022, 10:24pm

I would understand if this was a task which is somehow related to DSP algorithms. However, in this case I use the “global analysis” - RMS loudness analysis of WaveLab. To make sure, no Effects Plugin could interfere, I have removed all of them completely.

This computation takes 14 sec in Rosetta and 43 sec in apple Silicon mode.

AKolbe · June 23, 2022, 8:10am

Based on this idea, I have been able to perform a render task of 4 files at the same time. This works pretty well, generates perfect renders without any glitches and leads to an impressive workload on the CPU:

So, it appears to me that WaveLab and TST3 plug-ins can indeed be used as parallel tasks - for offline rendering, not play-back. However, it would just require some changes in the application to apply this approach to a single file. It would mean that WaveLab would need to split one file into “chunks”, let them render in parallel and then re-combine the results into a single results file…

absolutelylasteffect · June 23, 2022, 9:38am

Yes, what you think is possible, but there is a problem: how to retain the correlation from the “chunks”. In video editing or image editing the single pixel is not correlated with another and you can process every single pixel on every cpu core. In your idea I need to split my track into x chunk for x core and process all chunk in parallel, but the last sample of first chunk is related with the first sample of the second chunk. All processes that make to do with a modulation or reverberation maybe have the problem… This is what I think but maybe the developers have a solution…

AKolbe · June 23, 2022, 10:08am

I agree with you that this is not an easy task. However, there are smart and creative people working on WaveLab, at the forefront PG1.

So, I just wanted to say that it doesn’t appear to be a limitation the internal processing of WaveLab or the way it uses VST3 plug-ins. Somehow a similar approach is already implemented in another application that performs offline rendering with all cores. So, these problems seem to be solvable. And, I might add, modern CPUs appear to have reached a rather constant level of single core performance while the increase of overall CPU performance is vastly derived from the increase in core count.

PG1 · June 23, 2022, 10:39am

Yes, this could work for some processes, but it is difficult to generalize this solution.
For example, the splits would need to overlap by several seconds, else some discontinuity would appear at the junctions (depending on the plugin). Then, the X generated files would then need to be xfaded/combined (hence you need a very fast drive, too).
Then, there would need to be as many plugin instances as splits, and for memory-hungry plugins such as Izotope ones, or those that rely on hardware DSP, this could be an issue.

Also, this method can’t work with lossy formats such as mp3 and aac.

So, yes possible by principle, but not without potential glitches. Is that worth the effort, maybe I don’t know.

As a side note, I think WaveLab 11.1 will come with an update after the summer, with some added performances for certain processes (in the Global Analysis process, this is the True Peak detection that causes the performance loss at this time).

AKolbe · June 23, 2022, 11:33am

Thank you for these insights, Philippe!

I very much appreciate the improvements announced in future versions of WaveLab 11.1!

You’ve got me with the hardware DSP issue - this is an obvious and hard limitation. However, system memory and storage device speed shouldn’t be much of concern, today. In my test with 4 simultaneous renders, WaveLab just used <3GB of RAM and SSDs are easily capable of 3-5GB/sec transfer speed.

I mean, you have implemented so many choices in the Multi File rendering dialogue (one task, … half the cores minus 2 , … all cores) if there was the same choice for single file rendering, a user could decide for himself how he wants WaveLab to perform, based upon restrictions like hardware DSP oder .mp3 format. And there are so many “try and error” runs in music production that I would guess, WaveLab user base will find out the best setting in each use case…

AKolbe · June 26, 2022, 11:50am

There is a funny observation that I would like to share with you:
I have made a 1:1 comparison (same file, same processor hungry iZotope VST3 plug-ins, same settings) on off-line rendering between WaveLab and Reaper - in both operating modes (Rosetta 2 versus Apple Silicon) and found some interesting similarities. When operating in Rosetta 2, both applications were pretty much identical in off-line rendering speed. It was just in Apple Silicon-mode that Reaper was able to render much faster than WaveLab.

So, to me this seems like there is room for optimization in the way WaveLabs lets VST3 plugins operate in Apple Silicon-mode:
Bildschirmfoto 2022-06-26 um 13.42.42