Apalling ASIO performance

TheNavigator · June 5, 2014, 3:59pm

I use the following setup:

AMD Phenom II X4
16 GB RAM
Cubase 7.5 64 Bit 7.5.30
RME Hammerfall Multiface, via a CardBus-card in a PCI to CardBus-Adapter

Problem is:

CPU load is at, hm, 35% (which is a joke) and ASIO already stutters

Seriously, what is wrong? I only use 35% of the available number crunching resources and ASIO already freaks out? WHY? What magic is happening in ASIO that prevents Cubase from utilizing all 4 cores?

TheNavigator · June 5, 2014, 6:51pm

I tried something else now:

Same synthesizer (u-he ACE) on four parallel tracks, same stuff everywhere.

I can load out my CPU to about 68% now before it crackles.

But why only 68%? I have a CPU with 100%… and I want to use all of that. What is happening to the wasted 32% of CPU power?

Patanjali · June 9, 2014, 9:23am

Is that % an ASIO Performance or OS measurement?

If it is the OS measurement, it might not be including the peaks. Audio recording relies completely on the peak CPU use for glitch-free performance, though forcing execution to spread across CPUs has helped alleviate pressure on one.

I have noticed that Cubase tends to favour CPU 0 more than others, but it may be the one the GUI thread is on. Of course, it only takes one CPU to overload to get glitches.

TheNavigator · June 9, 2014, 9:37am

Sounds very reasonable, I never thought of it this way.

I was always suspecting the PCI → CardBus - bridge I use to be the problem, but on the other hand a notebook with a CardBus-slot uses the same bridging and people are really very satisfied with the RME performance in notebook configurations.

I just wish Steinberg would go one step further and introduce automatic, transparent background freezing.

Patanjali · June 9, 2014, 9:58am

When we were doing our CD 10 years ago, I persuaded Max of FX-Max to make his FX-Freeze VSTi freezing plugin more generic by make it freeze whatever was fed into it, rather than only handling the VSTi on the track. That meant it could be inserted in any track or group and freeze EVERYTHING up to that point. Of course, being a plugin, it couldn’t disable ALL the tracks feeding a group, so that was still a manual operation.

Nonetheless, we couldn’t have done our CD on the then single core AMD without it, especially as we were doing 96k. Unfortunately, he went out of business before he got to do a 64bit version, otherwise I would be using it now. I haven’t found a replacement yet.

SB would be the best ones to properly do freezing at ANY level, as they have access to all of the other plugins to disable them as required. One critical issue is how to handle send sources and sidechains.

Even with such a facility, one has to use it wisely, otherwise one might overuse it to the point where one cannot unfreeze some parts because they would exceed the remaining CPU capacity.

Basically, freezing converts resource usage from CPU to disk, so the disks have to be able to handle it, though using an SSD for the project drive should be fine.

Freezing is especially useful for VSTis if using higher sample rates than what the samples are provided at (44.1k for EastWest) as the CPU is no longer doing on-the-fly sample rate conversion (SRC). Also, many VSTi provide FX, like reverb, the CPU usage for which would also be saved by freezing.

TheNavigator · June 9, 2014, 11:17am

Right.

It would already help if Steinberg provides us with “bounce in place” with “unfreezing part by part”, so we can play LEGO with frozen parts and change as little as possible.

Problem is, right now, if I use multiple VSTis to build a song, I can never hear everything in later stages of song part construction, because I have to mute / freeze lots and lots of tracks, which is really crippling.

I mean, ASIO guard shows us that Steinberg is AWARE of the problem, but I really wonder why they stopped after the first two babysteps instead of being consequential by saying “lets remove as much realtime processing as possible by storing already calculated stuff in RAM or on the harddisks and only calculate as little as possible”.

Can you imagine turning a 6 core machine into a 6000 core machine? Automatic background freezing could do that.

Patanjali · June 9, 2014, 11:33am

Like any large scale software, SB is between the rock of ensuring program stability and the hard place of user flexibility and facilities.

Like providing a decent macro programming facility, universal freezing can easily lead to substantive user f-ups unless the usage scenarios and rules are well designed AND enforced.

There may be several parts to the puzzle that are dependant upon backend facilities in Cubase that are not up to the task as yet.

Several years ago, I suggested that it would be good if the display of channels in the mixer could be controlled by whether track folders were open or not. It would do for pixel usage what freezing would do for CPUs. However, we are only just seeing support for conditional displaying being incorporated into the mixers, which is a prerequisite for fully automating it.

Basically, the ‘fertile ground’ for full freezing may not be prepared enough as yet, but is probably on the agenda for progressive implementation over several releases.

TheNavigator · June 9, 2014, 11:45am

I think it is an architectonical problem and a question of “how clean is my code” (this is my field of expertise in professional life).

To provide such a functionality, the software has to have a clean object oriented design with a common base class for all the steps in audio processing, which could then (and only then!) be enhanced by a new path for transmitting “I have changed - everything from this point on needs to be recalculated” - information.

And it needs a “handoff” - mechanism, or at least to be able to support one quickly.

I have done a rudimentary study and analysis about “implementing a DAW with automatic background freezing” and I encountered a few obstacles, but nothing that can’t be solved with a little bit of effort.

As you said, sidechains and send channels would complicate things a bit, but with the “change notify” - mechanism it wouldn’t be that hard.

Let me explain a simple sidechain scenario, which can be scaled:

You have a VSTi which goes to channel 1 and an audio track on channel 2. Channel 1 output controls channel 2 compressor sidechain.

So:

Channel 1:
VSTi → Channel 1 + Sidechain of compressor in channel 2

Channel 2:
Audio playback → Compressor (with sidechain input) → Channel 2

Gives us the following “freeze points”:

a) Channel 1 VSTi
b) Channel 1 output (if more processing happens before that, otherwise it’s optimized away)
c) Possibly also the channel 1 sidechain send (if it’s modified, but pure level changes as in “send level” are irrelevant, because one multiplication per sample is virtually without any cost CPU wise)
d) Channel 2 audio file (may be optimized away, but processing may take place)
e) Channel 2 output post compressor
f) Channel 2 output (if necessary)

So, the “freeze points” are “connected” to each other in the same way their audio paths are connected - and if one “source” changes, the change is transmitted to the “sinks” and the sinks then know that “everything needs to be recalculated from now on”.

And to solve the problem about “change is accepted” (“undirtying”), one can always use a unique “change transaction id” which could be something as simple as an every incrementing 64 bit number or, if the developers can’t do a “f(x) → static x++” for some reason, a timestamp.

Meaning:

A “source” doesn’t actually send “I have changed” to its sinks but rather a “my change transaction ID is 1377, if yours is lower then recalculate yourself”.
The compressor with sidechain has, of course, two sources from which audio data and change transaction IDs come from - and if one changes, it knows “hey I’m invalid, I need to go online with my processing”

And no, VST interface changes are not necessary, the plugins can stay as they are. It’s all done by switching plugins on and off intelligently, utilizing preroll and postroll intelligently, too. Possibly going as far as recalculating the whole track on 1 - 2 cores in case of need.