I need more coffee, but as far as I recall the theory is that we need to limit the bandwidth of the signal we wish to capture to below half the sample rate, and as long as we do that we can correctly capture that information without problems. The important part is that this is about capturing a signal, i.e. converting from analog to digital.
However, weâre still going to be left with problems restoring that signal using the digital-to-analog process unless we also bandwidth limit the output using another lowpass filter.
My guess then would be that there is different filtering going on during different processes. So for example; your unaltered sample was captured one way which of course wonât change, but during playback there is some filtering going on at some point in the signal chain - this has to be true because itâs part of the procedure of reconstructing an analog waveform. However, if you sample rate convert then again youâll have to be a bit careful with what you do, and if you apply a low pass filter during that process that filter might not be the same as other filters.
In other words I would expect a 44.1kHz filter to be steeper than a 48kHz filter. As a matter of fact a lot of people argue that any audible differences when you switch from 44.1kHz to a higher sample rate is not because of the higher sample rate but because of the filter slope being different. So we should expect filtering to be different possibly also during sample rate conversion done âofflineâ.
To understand why there would be any âcontent above 22kHzâ we just have to think about it differently. It isnât about what we capture, it is about what happens when we reconstruct the waveform.
âStair stepsâ is not what happens in real life in digital but it provides us with a good visualization of some of the (solved) problems with digital. A sinewave is smooth. If you start stacking odd harmonics to the fundamental sinewave then the waveform looks different. If you have an infinite amount of odd harmonics of that sinewave, at the same amplitude, then youâll have a perfect squarewave. Conversely then it means that âsharp edges = high frequency contentâ.
So now ask yourself what happens when you have sample 1 @ -10dBFS and sample 2 @ -15dBFS. If you visualize that change there is no smoothing, because this is digital. There is an instantaneous âvisualâ change in âdirectionâ at sample one so it points toward sample 2 in a straight line. It wonât be a 90 degree angle, but it will be an instant change. And instant change = sharp edge = high frequency content.
So that is the ââcontentâ above 22kHzâ. It wasnât present in the signal we captured, itâs just a âquirkâ of the system. How do we get rid of it? We just filter it with a low pass filter and the problem is solved.