"Preserve Formant" option produces a lot of artifacts

Amberfields · August 8, 2020, 5:41pm

I used the transform tool to pitch up/down a vocal sample and used the “Preserve Formant” option. Even when done in small amounts it creates an unnatural sound and introduces a lot of artifacts. The sample itself was recorded professionally so the quality of the sample was good. Am I doing something wrong? What is the intended use case for this formant option?

Robin_Lobel · August 8, 2020, 6:41pm

There’s an issue with preserve formant that is fixed in the patch coming wednesday.

Amberfields · August 15, 2020, 2:51pm

The patch fixed the problem with the artifacts which is great. However a have a question regarding the use of the formant mode. When I pitch up/down a vocal sample the result is highly dependent on the parameters in the display panel like FFT size. Are the settings in the display panel not only for the visuals but also used for the calculation? What would be optimal setting for pitching vocals to have a similar result to Melodyne or the Radius algorithm in RX7 which has more clarity than my attempts with SL7?

Robin_Lobel · August 15, 2020, 3:16pm

Indeed, that’s one of the key concept of SL: what you see is what you get. Every tool, every process depends on the spectral parameters. The more clarity you set for a given problem, the better your results will be.
I don’t have specific recommendations for vocals though, depends on the pitch and sample rate for instance.

Amberfields · August 15, 2020, 3:58pm

Thanks for your quick response and explanation! Normally I don´t change these settings because I have good results with the default values. Maybe it´s a noob question but if I get better results with more clarity (higher FFT sizes I guess) what are the lower sizes for? If I set the FFT size in my vocal example to 256 the results would be quite bad. So I´m just curious, why would one use lower values if the results are better with higher values? Could you please give me an example I would like to learn more for better understanding. Thanks for your help.

Howl · August 16, 2020, 8:38am

i can only give one example, a click, only visible in a spectrogram, a “big” click, a glitch while recording within voltage modular.
i tried de-click, it didn’t remove the click completely. i set FFT ize to 512, de-click worked perfectly for this “click” (that was a line, vertical, in the spectragram, can’t see it now… not in my studio, for a click, a broad frequency).

why this worked? i can “feel” it, so to say, because the with a higher FFT size it will interpret some of the glitch as sound, i think…

yes, this is very good question, i use FFT size, when needed. by experience, you will notice the difference, but a full understanding, even if i know what FFT is, is still out of my league…

Amberfields · August 16, 2020, 7:00pm

@Howl: Thanks a lot for your insights and the example.

I tested a few things and would never have guessed that the FFT size could have such an impact on the results. But I still don´t know which processes are affected in which way. When using Unmix Stem it doesn´t seem to do much. But with Unmix Components it´s night and day. Also the Reverb reduction highly depends on FFT size but the Heal process doesn´t seem to get influenced that much. It is quite a trial and error thing sometimes. My vocal example needs exactly a size of 640 samples. Anything above and below didn´t sound as good. I don´t know if all users of SpectraLayers are aware of how much you can change the quality of the results.
It was an eye opener for me how to refine my results when repairing audio and it would be nice if people would share their experience when to use different FFT sizes.

Howl · August 17, 2020, 7:08pm

Amberfields:

@Howl: Thanks a lot for your insights and the example.

I tested a few things and would never have guessed that the FFT size could have such an impact on the results. But I still don´t know which processes are affected in which way. When using Unmix Stem it doesn´t seem to do much. But with Unmix Components it´s night and day. Also the Reverb reduction highly depends on FFT size but the Heal process doesn´t seem to get influenced that much. It is quite a trial and error thing sometimes. My vocal example needs exactly a size of 640 samples. Anything above and below didn´t sound as good. I don´t know if all users of SpectraLayers are aware of how much you can change the quality of the results.
It was an eye opener for me how to refine my results when repairing audio and it would be nice if people would share their experience when to use different FFT sizes.

i hope the manual get updated, how much of the processing features depend on FFT size (and perhaps other stuff…).
FFT size has infuence, i already experienced that, but indeed unmix stem seems not be influenced (or it worked, with the FFT default size).

perhaps it is always trial and error. but i think more direction is needed.

what is the real influence of FFT size?

so many things depend on it. FFT size is number of samples, or bins, taken, and affects “time”, or not within spectralayers.

with melda plugins, some of them, you must also work with FFT size, but that is real time. the FFT size influences greatly the results. but they are explained (yes melda explains! many help menu’s, with many menu’s…)

unluckely we can’t tag robin here, but perhaps a PM can do wonders. because i had that file it didn’t work, i fast forward, robin will add info about de-click and FFT size.

but it seems, also if you know the influence. it will always source depended, of course. but a little guidance, makes perhaps the workflow somewhat faster.

and in a way, although it is not so automatic, you have more control what the result will be. or in a way… this is the power of spectralayers, you have to do more, but the results can be “personal”.

so i agree with, are users aware of this central proces of spectralayers?
and can there be insight? why does a low sample, a low bin sample “resoluton”, work better? (perhaps as i stated with de-click), and FFT sample, with high bin rate (high bin rate??) better in another situation??

and per process?

EDIT:

i always read articles, but remember not a lot, after some weeks… about synthesis, filters, additive (is synthesis…), about FFT, but the result is that i can use a certain, more complicated plugin, better. and sometimes i have to refresh, too much stuff, or too much, i can have every approach, almost…

i am not an expert. many articles about FFT, but mainly, i think for additive, or other stuff. ánd the FFT analysis within spectralayers has a few algorithms, blackman-harris is the default, the name you mostly see.

math isn’t my thing, not a dsp developer, but by experience and reading articles (or YT) i learn more. even more in-depth stuff i can follow, but really digest it?? haha

FFT is the resolution of the sample taken per time, in a way spectral slices???

Robin_Lobel · August 17, 2020, 8:03pm

Sorry I don’t have much time to answer all messages, but here’s a quick clarification:

-The following Processes don’t depend on spectral parameters (such as FFT Size): Generate, Amplitude, Clip Repair, Voice Denoiser, VST3 Effects, as well as Layer > Unmix Stems
-All the other Processes depend on spectral parameters, as well as Layer > Unmix Components

-The following Tools depend on spectral parameters: Magic Wand Selection, Frequency/Harmonics Selection, Transient Selection
-All the other Tools don’t depend on spectral parameters

The documentation will be updated to emphasize the importance of those parameters for the given Processes and Tools.

Howl · August 18, 2020, 7:56am

thanks for the answers, i hope the manual will be less spartan, and reflect the influence of the named parameters on processes and tools (the selection tools you mentioned, are the selection that are sensitive to FFT, by design…).
i hope you can also give a direction in the manual, why a lower setting or a higher setting can be more usefull, or better, what higher and lower, lower has advantages and disadvantages, as higher. to clarify why. a small explaniation. it will help a lot of users, and will make the program more accessible.
of course, by experience you will get it, but it is nice to know, why?

thanks, already!

Robin_Lobel · August 18, 2020, 10:06am

Here’s a little more details about that FFT Size parameter: I usually explain it as the equivalent of focus in photography.
Imagine a lens with a wide aperture, you can’t get every depth in focus at once, you have to choose your subject.
The same goes for the FFT Size, it’s a focus control which instead of choosing a certain depth, choose a certain time/frequency balance.

The smaller the FFT Size, the more details you’ll get with time-centric events (such as transient sounds) but the blurrier the tones.
The larger the FFT Size, the more details you’ll get with frequency-centric events (such as static tones) but the blurrier the transients.
However spectrograms are not just transients and static tones, there’s a wide variety of frequency shapes in a recording.

See the spectrogram of a voice for instance : it’s a lot of frequencies stacked on top of each others, which are not just straight lines but wobbling lines. Which means they are not purely horizontal or purely verticals, but a mix of both, and they vary over time. So sometime you need to focus on the horizontal parts, sometime on angled parts, sometime verticals, sometime curves, etc.

A picture being worth a thousand words, here’s the same example under 3 different FFT Size (see the parameter at the top right of the screen):

Howl · August 18, 2020, 1:09pm

Robin Lobel:

Here’s a little more details about that FFT Size parameter: I usually explain it as the equivalent of focus in photography.
Imagine a lens with a wide aperture, you can’t get every depth in focus at once, you have to choose your subject.
The same goes for the FFT Size, it’s a focus control which instead of choosing a certain depth, choose a certain time/frequency balance.

The smaller the FFT Size, the more details you’ll get with time-centric events (such as transient sounds) but the blurrier the tones.
The larger the FFT Size, the more details you’ll get with frequency-centric events (such as static tones) but the blurrier the transients.
However spectrograms are not just transients and static tones, there’s a wide variety of frequency shapes in a recording.

See the spectrogram of a voice for instance : it’s a lot of frequencies stacked on top of each others, which are not just straight lines but wobbling lines. Which means they are not purely horizontal or purely verticals, but a mix of both, and they vary over time. So sometime you need to focus on the horizontal parts, sometime on angled parts, sometime verticals, sometime curves, etc.

A picture being worth a thousand words, here’s the same example under 3 different FFT Size (see the parameter at the top right of the screen):
https://i.imgur.com/y47vR5T.png
https://i.imgur.com/PEah3Pe.png
https://i.imgur.com/EjTkm5D.png

thanks! i thought while i was writing the other post, or thought, the word “transient” came up in my mind. but because i didn’t know it for sure… and the word “time”, and yes, there is more to it. you explain it well!

i will print this… perhaps worth a special paragraph (with the pictures) in the manual!
or print this, save this: i am dutch, i am cheap…

very worthfull information to work with spectralayers!