Unmix Noisy Speech & Voice DeNoise limit output frequency to ~17kHz

Hi everyone,

I’ve noticed that when using Unmix Noisy Speech or Voice DeNoise on 48kHz audio files, the output appears to have a cutoff frequency around 17kHz, even though the original source contains frequencies up to 24kHz (Nyquist for 48kHz).

Steps to reproduce:

  1. Import a 48kHz WAV file containing clean studio-recorded speech with full frequency content up to ~24kHz

  2. Apply Unmix Noisy Speech module (or Voice DeNoise)

  3. Observe the resulting layer in the spectrogram

Expected behavior: The processed layer should retain frequencies up to 24kHz (or at least preserve the upper harmonics present in the original speech).

Actual behavior: The output is cut off at approximately 17kHz, as visible in the spectrogram. The original file clearly shows content extending to ~24kHz, but after processing, the high-frequency information is missing. This issue occurs with both Unmix Noisy Speech and Voice DeNoise.

Screenshots:

  • Image 1: Original 48kHz file showing frequency content up to ~24kHz

  • Image 2: After applying Unmix Noisy Speech – Speech layer shows cutoff at ~17kHz

System info:

  • SpectraLayers Pro 12

  • macOS 15.7.1

  • Source file: 48kHz WAV, mono, clean studio recording

Is this a known limitation of these speech-related algorithms, or could this be a bug? I’d appreciate any insights or workarounds. Thanks!

This is a nice catch, but regarding possible workarounds…. Why? heheh!

Not that it shouldn’t preserve those frequencies, for the sake of strict flawlessness in its functions, but what is there that is useful to you? Genuine curiosity. Are you by any chance pitching these voices an octave down or something and want these frequencies in for effect?

Cause in my use cases treating dialog never have I ever needed any information above 17kHz in the first place, even less so in dialogue that needed noise reduction.

2 Likes

um, if I remember correctly, Robin had set the LPF to 6K at the launch of SLP12(!) (wrong! it was 12K) There is a thread about this on this forum somewhere. After user outcry, Robin increased the LPF to something like 11KHz or thereabouts…so if you are getting LPF of 17K, then possibly Robin has adjusted further?

Personally, my upper limit is 12K…so, yes I can see and edit out unwanteds (like telephonica or HF interference) yet, I, personally, cannot hear up there…hence why no one is hiring me :stuck_out_tongue_winking_eye:

Like @henrique_staino says, in my work with IV/ dialog, all the NR work is 8K and lower unless a broadband blast or RF dropout that is visible and high energy

Most of the heavy lifting is NR to LF in my work

1 Like

OK, found it

I am incorrect, the cut off was 12K

Here’s the thread:

Thread: (SLP 12 Cuts Off At 12K)

1 Like

Not to say it’s semantics, but the information is still -there-, it’s just on the noise layer… and some mixers I work with would be adamant that’s exactly where it belongs :rofl:. I personally don’t feel it’s a critical voice-region for intelligibility/clarity and if I’m ever messing around in that region it’s usually to get rid of costume jewelry ringing.

2 Likes

Thanks everyone for the insights! I appreciate the practical perspective.

@henrique_staino For me, that’s where the “air” lives — it adds a sense of openness and presence to the voice. I know it’s subtle and not essential for intelligibility, but when working with high-quality recordings, I prefer to keep it intact and make that decision myself rather than have the tool remove it automatically.

@ctreitzell Thanks for finding that old thread! It’s interesting to see this has been discussed before. It seems like Robin has been gradually increasing the cutoff based on feedback!

@dustinharris Good point about the info still being on the noise layer. However, my concern is that if I want to do further processing on the speech layer, that high-frequency content is already separated out, even when the original speech was clean.

Anyway, my point is about consistency - if the tool is designed to separate speech from noise, it would be nice if it didn’t make assumptions about what frequencies are useful.

I tested this with a clean studio recording, so ideally all frequency content would be recognized as speech. It looks like the model was trained to perform separation only below a certain frequency, and simply assigns everything above that range to the noise layer.

It would be great to have an option to preserve the full spectrum, or at least have this behavior documented somewhere. But either way, now I know what to expect and can plan my workflow accordingly. Thanks again!

1 Like

I think that’s fair and I think I’m with you now. Sometimes I Do EQ just the voice layer so I’m not bringing BG noise up at the same time, and yeah, in the top end were I’m reaching for clarity, sometimes voice content that should be EQed as well remains on the noise layer, unaffected. It hasn’t been a show-stopper for me, but in best theoretical practices, I think you’re right.

2 Likes

I had presumed it was a numbers crunching scenario.

SLP12 is using unmix more recent AI models said to yield better results, yet take longer to process. Robin expected outcry from unmix music focused users before the launch of SL12 (as can be seen all over this forum) and possibly Robin thought keeping UNS to not increase (too much) the processing time for the dialog focused users…that is a guess which I dreamed up in my tiny brain cell…I’m probably completely wrong :slight_smile:

Sounds plausible… it would be cool if there were different speed/quality options in a dropdown menu in the module so users could pick their favourite compromise…

UNS is still pretty fast…certainly plenty quick enough for my uses…
yet from your other posts, sounds like not fast enough for you :stuck_out_tongue_winking_eye:

hahahaha no for UNS I’ll always take max quality… then make up for the processing time with the milliseconds I save with key commands :rofl:

1 Like

Could we have control over the filter?

1 Like