Huge post about samp.frequency/bit-depth/SNR/dithering etc.

A “few” :smiley: words about sample rate, bit-depth, signal-to-noise ratio (SNR) etc.

Not trying to be any kind of an utterly annoying scientist here but some people don’t even know what they are. And that’s ok! Who can know everything when they are born? For the sound quality, let your ears decide; if it sounds good, there’s not a d*mn thing anyone can nag about. Everyone knows that Led Zeppelin’s IV or Beatles’ Magical Mystery Tour weren’t done digitally :slight_smile: Anyways I hope to be able to shed some light over the digital sound processing theory.

Anyway:

Sample rate / sampling frequency:: When analog-to-digital converters (that everyone has inside their sound cards) receive an analog signal (analog: signals strenght is analogous to the volume/gain) it takes a certain amount of AMPLITUDE samples of it per second. For 44,1kHz AD-transformation the sound interface takes 44100 samples per second of the signals Amplitude.

Bit-depth: AD-converter then transforms the analogous amplitude value (which is the GAIN of the signal at that given sample point) into digital value. In 16-bit bit-depth the Amplitude value then has 65536 possible values. Meaning number 2 powered to 16th.

So theoretically one second long monologous sound signal of 16-bit depth and 44,1kHz sampling frequency could be presented in [65536 powered to 44100] ways. Someone wanna calculate that? :smiley:

So 16 bits form 2 bytes. So it’s easy to calculate that a 16-bit depth 44,1kHz stereo signal takes space from the hard drive as follows: 2 (channels) * 44100 * 2 = 196400 bytes/second that is some 196kB. So one megabyte of memory only is enough for about 5 seconds of CD-quality audio. There is some overhead to this but I’m not aware of any of that stuff. Basically 1 minute on CD-quality audio should take about 11~12 MB of harddrive space.


How sound is calculated by the bits? Well, digital data does consist of single bits whose value can be either 0 or 1. So how is it possible to tell signals amplitude just by 16-digits with value either 0 or 1? This is how:

The first bit is given to represent a decimal system value of 1. The next bit represents value 2, the 3rd bit represents number 4, the 4th bit represents number 8, and so on. Always multiplying the decimal system value by 2 for each next bit.

So let’s make a 8-bit long digital number for clarity: 01001011. So what number does that represent in the decimal system? It’s 210.

Why?

OK, all the zeros are not counted so the first bit (being 0) represents the decimal system value of 0. The second bit, however, is 1 so it represents the decimal system value of [2 powered to 1] which is 2. If the first bit would have been 1, then it would have represented the decimal system value of [2 powered to 0] which is 1.

OK, now we have one decimal system value of 2. The next bit that is has a value of 1, is the 5th bit. So it represents the decimal system value of [2 powered to 4] which is 16. The seventh bit (again is 1) represents value of [2 powered to 6] which is 64, and the 8th bit represents the value of [2 powered to 7] which is 128.

Now we just have to sum these numbers, that is: 2 + 16 + 64 + 128 = 210.

For 16-bit numbers however the corresponding decimal system values for 16 bits are: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384 and 32768. And summing these numbers randomly you can make any number between 0 and 65535, thus having 65536 possible values altogether. And a certain number you can get only one way.


Logarithmic scale and desibels?

At first: desibels: all the gain boosts and reduction is presented with a logarithmic value in desibels. Like -3dB or +20dB etc. What a logarithmic scale means, is that when take for example the decimal system representing values of bits (1, 2, 4, 8, 16, 32, 64, etc.) you always multiply the earlier value by two. If you put present them in coordinates, the curve would very soon go so high that the scaling becomes an option. If you take a logarith of these numbers, then in coordinates those logarithmic value are LINEAR; LOG(1) = 0, LOG(2) = ~0,30, LOG(4) = ~0,60, LOG(8) = ~0,90, LOG(16) = ~1,20 and so on. So as you can see, the logarithms grow by the CONSTANT amount on every step.

The Desibel is however calculated as follows:

dB = 20*LOG (Y/X)

where:

X = initial signal strength
Y = new signal strength
LOG = desimal system “10-based” Logarithm (NOT the Natural Logarithm which is based on the Nephers number e=2, 71828…)

So how many desibels do you have to strengthen the signal so that it will produce sound volume exactly 2 times louder?

OK, in that scenario the Y/X equals 2, and the equation gives 20 * LOG (2) = 6dB. (6,020599914dB to be exact.)

So what if you boost it another 6dB? And another? And another? Then it is 2 * 2 * 2 * 2 = 16 times louder than the initial sound volume, but the desibels just got from 0 to 24dB. OK, that 16 times more boost is nothing since almost everyone of you can hear very, very silent voices and the dynamical scale of a human ear is about well over 100 desibels so if you want to show very silent sounds on a LINEAR METER, then you could hear the, say, songs fade-in coming long before the linear meter even shows anything! :slight_smile: That’s why the meters are logarithmic so that they will “notice” also the most silent voices - like your ears do.

So what about two separate sounds with volumes 25dB and 100dB? How much stronger is the 100dB voice? Four times as loud?? Wrong. The difference is 75 desibels so you calculate it with an earlier equation but reversing it:

dB = 20*LOG (Y/X)

--------->

Y/X = 10^(dB/20) = 5623.

So the 75 desibel gain boost actually makes the voice 5623 times louder!! Still you can easily hear also the 25dB voice. And, I’m sure, the 100dB voice either, but it’s more like a NOISE except for being a voice at that point :slight_smile: So I hope you understood a bit about the Logarithmic scale now…


So signal-to-noise ratio (SNR) and dithering? What are those then??

OK, as you might have noticed, there is only a limited amount of values you can represent digitally (with 16 bits there is 65536, with 8 bits there are 256 (2 powered to 8th) and with 24 bits there is 16777216 (2 powered to 24th)) so what if the analog Amplitude value is between the two nearest digitally presented Amplitude values? That’s where SNR and dithering comes in.

Let’s take 8 bit depth as an example. You can present 256 numerical values with it, ok? So what if the analog signal would suggest that AD-converter has to interpret the value of 174,5? Well, sorry Mister Analog to Digital Converter but there’s no way you can do it. So AD-coverters “round up” that analog value to the nearest digital value and that will ALWAYS emerge as NOISE. Which is called by name Quantization Noise.

So “how much” is that when recorded with 8 bit depth only?

OK, the shortest “distance” in Amplitude between two 8 bit digital points is 1/256 = 0.00390625 = 0.390625% of the full dynamic range of the 8 bit signal. So that’s exactly the amount of noise that there is in the signal at minimum, when AD-converter quantizes the signal. So how many desibels is that? Let’s calculate:

dB = 20*LOG (Y/X)

Here Y/X = 0.00390625 so the equation derives -48.16 desibels.

So with 8 bit recording the quantization noise is “only” about 48 desibels lower than a signal which is very well audible. Anyone can hear that in the songs that are quantized with 8-bit depth! One thing also is that because the recorded signal is basically never using the whole dynamic range, the Signal-to-Noise ratio is a way bigger.

What about 16- and 24-bit depth then??? Here:

16 bits: dB = 20 * LOG (1/65536) = -96dB

24-bits: dB = 20 * LOG (1/1677216) = -144dB

So as “lousy” as with 16-bit signal, the Quantization noise in AT MOST 96 desibels lower than the signal. However if you record with the settings that your peak level of amplitude of vocals, guitars etc. is about 50% of the dynamic range, then the signal uses about 6dB less (at maximum) the dynamic range (remember: 2 times louder is 6dB so “2 times more silent” is -6dB). So the SNR raises from -96dB to -90dB (with 16-bit depth). Goes without saying that when 24 bit resolution is used, the SNR is simply irrelevant when compared to other resources of noise that are present.

However as your DAW processes the signal, there will be an additional S-to-N added to the signal. However Cubase luckily uses 32-bit depth algorithms internally so it’s not a big deal. However, the lower bit-depth you record with, the more there’s noise - recorded in the high-end studio or not - so an extensive use of plugins MIGHT produce some extra noise to that initially recorded material. So to get extra precision onto the audio, it’s better to use (at least) 24 bit-depth, especially if making acoustic material. But if your audio hardware is not supporting it, then it’s of no use; you’ll get that noise anyway to begin with :slight_smile:

Remember that this theoretical maximum of SNR assumes a perfect input signal. If the input signal is already noisy (as is usually the case), the signal’s noise may be larger than the quantization noise. Real analog-to-digital converters also have other sources of noise that further decrease the SNR compared to the theoretical maximum from the idealized quantization noise, including the intentional addition of DITHER (see below).


What the fK is dithering then?**

Well, you can either use bigger bit-depth for increased accuracy or then you can use Dithering. Dither is an intentionally applied form of noise used to randomize quantization error and is often one of the last stages of audio production to compact disc. I haven’t been able to hear any difference whether I used this UV22HR sophisticated algorithm in the Mastering process or not. But I guess with the less noisy music (than Metal/Rock, that I do) it might have some significance.

So as the CD-quality signal is “only” 16 bits in depth, one ALWAYS has a noisy CD :smiley: Well, this is arbitrary of course, but anyhow the advanced dithering algorithms are used to minimize the effects of the compulsory noise there is in the signal, by adding kind of a “white noise” (random sound with a uniform content of different frequencies) since it’s impossible to be heard throughout any signal.


Sampling frequency revisited

So how much information does the 44,1kHz Sampling Frequency (SF, from now on) get from the signal? Well, actually the highest pitched signal that can be captured with the 44100Hz SF, is the exact half of it, namely 22050Hz. Why the half, you ask? :slight_smile:

Think about a Sine wave, the most pure type of sound there is, beautifully ranging from down to up to down to up in a time-scale. OK, how many measure points do you need to get the idea digitally, that it’s actually a pure Sine wave? The answer is two; You’d need to take a sample where the amplitude is at MAX value, and at where it is at its lowest. That way you can draw the SAWTOOTH waveform by adjoining the consequtive points. Get it?

If the sound’s pitch to be recorded is higher than 22050Hz - which is rare, to say the least - the AD-converters can’t capture the actual higher pitches but the lower-end aftifical frequencies are generated. Those are not “real” sounds at all thou. Plus they are similar by frequency and always there are two of each artifial signals available; each with opposite phase so they’ll almost completely cancel out. Anyway, that’s one reason that about 20-25Hz High-pass filters are used in the mastering.

OK, and why was it that you can only record sounds up to 22050Hz with 44,1kHz SF? That’s because you need TWO samples for a one whole wave-length. So 44100Hz/2 = 22050Hz. Some people think that Stereo signal is twice as precise as the mono signal since it’s 2 times as big in the hard drive. That’s not the case; Stereo signal is just 2 mono signals, one for the left and one for the right channel. Again, not trying to be annoying, but sincerely not everybody knows that. Also one of my friends argued with me many years ago that “it’s better to make a STEREO signal out of the MONO signal since it’s STEREO and sounds better!”. What a bunch of crap. If you record it as mono and then make a stereo file out of it, it only takes 2 times more space and has got absolutely nothing to do with how it sounds. The DA-coverter of any sound card splits it anyhow to left and right. The excat same signal for both ears. But if you record the mono signal as stereo, there might be some very minor differences, depending on the sound card’s quality, but again. There’s no use doing so.



BUT there’s a catch (even if you can’t hear the over 20kHz voices): If you intend or suspect that you’ll be doing time-compressing or pitch-shifting (maintaining the length) then it’s better to have higher SF on the, say, vocals. That’s because the frequency content by the Fourier algorithms can be more precisely calculated thus following as a more precise outcome.


OK, my head will explode in a minute if I don’t stop and you must be suffering the same. Anyhow, I wanted to write these things down. Maybe there’s some use of them for someone, somewhere, sometime… :slight_smile:

BUT answering to the thread’s subject line: Since the CD-quality is by 44,1Hz standard then I’d wager it would be better to use 44,1kHz SF initally. Of course 96kHz would be better, even if the fact is that the downmixed material is “flattened” anyway, but the result can’t be no worse anyway (with the 96kHz, I mean). The 48kHz difference to the 44.1kHz is so little that the “horizontal dithering” that comes along with a SF transition would riun that anyway, if no state-of-the-art convertors are used…

As a conclusion: let your ears decide and the bottom-line is: they are GOOD SONGS that count.


~Tommy~

1 Like

Who gives a sh1t…make some music :laughing:

Wha’? Yew qualified Docter?

:mrgreen:

Just kidding. Great reminder for me.