Allow Cubase to export projects in 44.1 kHz at 96 kHz FOR REAL

Can you identify any of our popular digital VSTs that rely on this theorem about sine waves? It seems to me that the essence of the digital domain is that natural sounds are represented as points on a graph, not as some combination of sine waves. I don’t understand why you keep bringing up sine waves. That isn’t how digital processing works, is it?

This thread isn’t about the theory of waves in a theoretical mathematical sense. It is about whether or not there are PRACTICAL advantages to moving to a higher precision environment when working with many tracks and many effects in the mix, particularly effects that add nuance (i.e. complexity) to the waveform, . If you have some ideas about that, I’m sure everyone would be interested. I just don’t understand the relevance of a discussion about sine waves. Sorry if I am being dense, but I don’t see how your comments relate to the topic at all.

As long as the signal is bandlimited, then a complex waveform is represented just as well as a simple one. It really makes no difference. There’s no need to change the sampling rate just because the signal is complex.

However, there CAN be a benefit in having more bits or a higher sampling rate during mixing than for final mixdown.

This is because effects plugins are doing calculations with the audio data, and these can create aliasing effects (reflections of higher frequencies down to lower frequencies) as described above.

Likewise, plugins often work at 64 bits because some calculations don’t work at lower precision, or create significant errors. And plugins can create random noise because of rounding errors, though this is small. but in theory it can build up over long chains of plugins.

So to summarise, the benefit of higher sampling rates and word lengths is not to do with representing complex signals, but to improve the accuracy of calculations.

2 Likes

I would say any plugin that uses some form of FFT (Fast Fourier transform), so any Frequency Spectrum Analyzer, or tools that operate in the frequency domain instead of the time domain, like Oeksound Soothe or similar. And of course any Synthesizer that uses additive synthesis.

any audio signal, digital or analog, can be represented in the time domain or the frequency domain, and those are interchangeable. Doesn’t matter how the signal is actually stored, as samples in a wav file or on a tape (an audio signal on tape isn’t stored as a “wave” either, but as magnetized particles). Try looking at say a 100Hz triangle wave with an Oscilloscope and a Spectrum analyzer.

@jwatte wrote a very good comment on the possible practical benefits of working in higher SR a few post above. They can be real, except not really in the way you think of if (it’s not to do with “nuance” or “complexity”, but with the prevention of aliasing and the effects of the anti-aliasing/reconstruction filters).
I would add that there are cases where higher SR can actually be harmful to a signal, if you stack several nonlinear processors (distortion, compressors) that generate harmonics above our hearing range, those can create intermodulation distortion that in the worst case can be audible (intermodulation distortion is the kind of distortion that make full barre chords on an electric guitar through a distortion pedal sound mushy and undefined). So it can actually be useful to insert a low pass filter at 20-22Khz after say a distortion plugin when working in 96KHz.
All this depends very much on what signals you have and what processors you use. As with everything in live, there is no simple black and white answer.

I don’t think the decomposition of sound into a sum of sine waves is particularly relevant to this discussion, but for interest’s sake I’ll answer your question.

In fact conversion of audio into the sum of sine waves representation happens all the time. Any plugin that applies frequency-specific processing will need to do this. So every time you use an EQ or multiband compressor, this conversion is happening.

1 Like

I agree that there can be benefits to working at a higher sample rate than 48kHz when processing audio. I think I made that clear before.

The reason why I talked about waveforms is pretty simple - you brought up how you think (parts of) digital audio works on a fundamental level and you used that as an argument for higher sample rates when mixing/processing. If the basis for your argument is wrong but your request is fair I’ll still comment on that because if we just ignore that you’re wrong then you’ll misinform a bunch of people. They would walk away from this thread thinking the same things you do about the fundamentals of digital audio and they’d be wrong. That’s unnecessary. Hence the correction.
(and a lot of people have the same misconception as you btw.)

Every time you say things like (paraphrased) ‘complex waveforms can’t be accurately represented in digital which means there are errors which means we need a higher sample rate during processing’ part of that sentence is just incorrect, and it’s therefore corrected.

That’s why “we” didn’t let this go.

Sounds aren’t represented as points on a graph. The visualization of digital audio as points on a graph is a bit misleading. The signal is represented as a stream of numbers, not points on a graph.

Again: It is the knowledge that complex waves can be considered a combination of sine waves that has allowed people to create a sampling theorem + technology to implement it that can capture and store analog signals in digital form. The complex waveforms aren’t literally broken down into components, it’s just that the processes going to and from digital ‘account for’ / ‘rely on’ the fact of what complex waveforms are. That’s why it works.

Here is the code for a 6 dB/Octave low-pass filter:

        state = (state * coeff0 + input) * gain0;
        output = state;

This is a bog-standard first-order low-pass IIR filter, and it uses absolutely zero decomposition to any basis other than the time domain.
The math gives us the rules, and the math says that “time domain” and “frequency domain” are equivalent, and you can transform between the two without loss.

So, an implementation in the time domain (which is most of them) will have effects in the frequency domain (which is, indeed, what we’re discussing above,) but they don’t have to go through the effort to actually shift basis to frequency domain. In fact, the main algorithm used to shift to frequency domain is the FFT, which will introduce significant latency to the signal processing, so any processing that’s very low latency (between 0 and 256 samples of latency, say) is almost certainly implemented in the time domain.

1 Like

I think we are all learning some important things here, which can happen when people approach a question from different points of view.

If I understand the rudiments of the digital realm, mixing two waves is simply a matter of adding the two samples (conventionally shown on the Y axis) at every sample point on the X axis. That simplicity is what allows rather ordinary computers – even cell phones – to mix waves with no apparently latency.

Likewise, raising or lowering the gain, is simply a matter of multiplying every sample by a fixed amount. But … even something as trivial as that, can lose a tiny bit of precision compared to the original sample (which itself was slightly imprecise.) In stands to reason that if even simple operations like this happen several hundred times in the process of rendering the ultimate output wave, insignificant errors could gather to become significant. I believe I am just stating the obvious here. I think just about everybody here agrees on at least that much.

Thanks for the correction - that’s very interesting!

What are these errors you speak about and that would relate to sampling frequency?

In the natural world, the sounds we hear are essentially infinite. Experienced bird-watchers can discern minute differences between the calls of different sub-species.

Moving into the digital realm, we don’t have some “perfect” method of representing those sounds without any losses. Instead the digital pioneers settled on a 2-dimensional sampling method we conventionally represent as a Y axis representing amplitude and the X axis representing time. The CD standard was set to 16 bit resolution on the Y axis and 44,100 samples per second on the X axis.

That has been considered very good, but not perfect. That is to say, with CD-level precision, there is a significant amount of error, but not enough to make a real difference to most users on most listening devices.

The whole point of this thread, I believe, is that in today’s DAW world, we have much more processing power available, leading people to apply more processes and more complex processes – because we can and we like the results. But each of these layers of processing can add to the cumulative errors. It is debatable how much of a problem this is, particularly considering that it is practical for just about any work to be done at 48K x 24 bits. Maybe that is enough for just about anything. I don’t know. That’s where I work and I don’t plan to change.

But the question is raised whether there would be a benefit if the DAW workflow allowed a final rendering to operate at higher resolution, even though the source material was recorded at a lower level. And those working entirely in synthesized sounds may have the option for the source material to be emitted at true 96K level.

I don’t know if there would be a real benefit. That’s the whole question here, I think.

As written several times here, yes, there can be a benefit in working at a higher sample rate, but (and I stand by that opinion) it depends very much on the source material and the processors you use. Whether that is beneficial for you is something only you can decide.
There have also been several examples why only having the final export in a higher SR while producing and monitoring the project in a lower SR isn’t a good idea - the main argument against it being that the end result isn’t necessarily what you were listening to.
Bottom line - if you prefer the sound of your instruments and effects in 96K, working in 96k from the beginning is the best idea.

100% agree with you.

Up-scaling sounds like a recipe for disaster to me. Especially to do with virtual instruments and plugins.

I am working at 48/24 and good old Cubase freaks out on me once in a while. And I have a very powerful PC. I dread to think creating a project at 96K…

One in a while, if I record a vocal or an instrument and I need higher quality recordings, I open a separate session, record at 96K (or higher) and I then down-scale.

Are there other axes we need to worry about?

How many axes is the electrical signal coming out of the microphone? Amplitude. That’s one. Over time… that’s two.

What else?

Considering that Jwatte has indicated a simple way to accomplish the rendering at higher resolution as:

it is probably worth somebody actually doing real tests to see what happens. My guess is that the “recipe for disaster” concerns are unfounded, but I could be wrong. But I also expect any benefits would be slight. No reason to speculate about it when we could simply do the tests.

First, a list of “which plugins sound different at 96 kHz on what kind of material” would actually be a pretty interesting list to read. If someone has a lot of plugins, a lot of demanding material, and a lot of time on their hands, I would love to see that list collated! I expect it to have several plugins on it, some of which may even be quite popular, but I’d also not expect it to be every plugin under the sun – the majority of plugins are probably OK.

Second:

This is a very common belief, and I used to share it in the past! With what we now understand about both physics and medicine, it is however wrong. And it’s wrong in a macroscopic, measurable, way – we have instruments that can measure it.

The reason is: Every biological system (including your ears) is ultimately a physical system (which goes to mechanics, which goes to chemistry.) Unless you are a very strong believer in certain religious tenets, there is no magical infinitely-precise analog juju that can transmit essence at a level undetectable instruments yet perceivable by your body.

Your ears do not have infinite precision. There are air movements that are small enough that your ears cannot perceive them, no matter how much you want them to. There’s a minimum activation threshold in the sound transmitting bones and ear drum, because of friction and stiffness inherent in the materials. There’s a minimum activation energy needed for your inner ear nerve cells, to overcome the minimal chemical hysteresis that prevents the ears from just constantly firing. Even the air molecules have some minimum inertia at a given temperature.

Add up all those errors. If we can capture and process a signal at a precision better then what those errors give you, then the capture, storage, and processing of that signal is perfect as far as an ear can determine, which is what matters. (Plato’s cave, Matrix, simulation hypothesis, and so on – post-modern epistemology teaches us that your reality is only that which you can perceive.)

Also, this thread should go into some FAQ about “common beliefs about audio and what we actually know in modern recording,” I think it’s a great summary all in all.

On another note, I still have those speaker wires for sale. Only $1000/meter. A bargain!

3 Likes

I don’t fully understand what is being discussed here.

  • Is it an “Offline Mode” quality button like there used to be on some virtual instruments of old? I understand they do it automatically nowadays. But the button wasn’t only about oversampling, they also switched samples to 24 when applicable, or revealed more samples for adjacent MIDI velocities.

  • Is it a “Project Switcher” button where the project ups and goes from 44.1/16 48/16 48/24 to 96/24 or even 192/32? If that’s the case, how do I know that A) My virtual instruments DO produce sound up there (above 20k - because they already oversample even at 44.1 or 48), B) What benefit is there to material already recorded by my interface at 44.1/16. The information is there, surely I’m not getting NEW information by just converting to another sample rate?

  • If there is a difference: How do I know what’s the end product? Why bust my ears to sculpt the sound at 48 when the high quality mode will sound different? How will I know the consequences of my actions? Wouldn’t I resign to working directly on 96/192 and live with the performance hit, instead of playing a game of guess the differences between 48 and >96?

Personaly, I work at 96k when I can, just for the better latency it offers. When I’m monitoring off another source (Totalmix, or UI24’s outs), I don’t care about sampling rate. In the Soundcraft UI’s case, it’s limited to 48/24, but it sounds as good (even better at times) than my interface’s inputs at 96/24.

Hugely good point…I don’t recall Virtual Instruments been offered to high rates of 96K and certainly not to 192K.

Yeap, what you do record into the computer is what matters. If one records at 44.1, upscaling will not make the recorded sound to be heard as though as the source was recorded at 96K…

It seems to me something that is not being appreciated here is that the accuracy of the sampling (that is to say, how closely the series of digital points matches the natural wave) is a function of how much granularity is allowed in the mapping space. And a key point is that an increase in EITHER the X or Y axes will result in greater granularity. In other words, if you increase the samples on the X axis to 96,000 per second, that means there are more Y values taken per second, so they can more closely match the natural wave. It isn’t about “hearing” pitches at 48K. Nobody can hear those frequencies.

44.1K x 16 bits = about 3 billion potential data points per second
48K x 24 bits = about 800 billion potential data points per second
96K x 24 bits = about 1.6 trillion potential data points per second

The difference from 3 billion to 800 billion is big. I’m not sure doubling that would make much of a difference.

@cparmerlee Your argument throughout this thread, if I understand you correctly, is that 96kHz sampling rate produces a more accurate representation of sound in the human hearing range than 48kHz does.
But the thing is, it doesn’t. It’s not a matter of perception, it is a fact that can be mathematically proven.
The integer number 3 can be represented with only two bits. If we transform this 2-bit integer to a 64-bit floating point number, it is still 3.

1 Like

Man, you really need to listen to what people are saying.

Greater granularity = higher frequencies.

If you don’t need higher frequencies you don’t need greater granularity. Stop misleading people, please…

1 Like