Adding a text to speech feature to SLP would be extremely popular. Open source code is already published as per the video below:
Ummm text to speech , me love a good text to speech , but is it what Spectralayers is about ?
I have been observing a lot of users requesting a lot of Voice/Speech features (such as A.I. voice cloning) within spectralayers. Iâm not totally against it however because it seems like only one developer is only working on spectralayers, that might make the development cycle extremely slow. I think the text-to-speech and the A.I. voice cloning idea is great and Iâd like to be able to record someoneâs voice and manipulate it into a voice like Mariah Carey(with the correct formants and the correct articulation in the voice print profile) but for that to be implemented within Spectralayers would probably take a development team rather than just one developer.
The best way I can reiterate what I said above (so others can understand) is to think of the idea of serum and Steve Duda. Steve Duda is a talented developer but he alone didnât build Serum by himself, he outsourced what was out of his expertise to a mathematician(so that things like dsp would be much more efficient on a low level api) and that allowed him to focus on other important features and other things. With Spectralayers (and I cant speak for Steinberg, Iâm only going based off of my observations and assumptions) it seems like itâs only one developer working on that application and it doesnât seem like Steinberg has any incentives to invest in Spectralayers for features like A.I. Voice cloning. I would like to see voice features like voice cloning and would like to see more features that uses A.I. to do various things with the voice and would like to see a feature that would allow you to type a sentence and have A.I. sing back what you typed(in Mariah Careyâs voice), but for that to become a reality Steinberg would have to make a decision and decide if those are features they want to invest in. I cant fathom one developer doing all that by himself and then even if that were possible, Iâd imagine it would take years for that to be implemented.
If you are correct and there is only one SLP developer I would agree. There are plenty of features and fixes that current users would prefer over a TTS feature. But SLP already does half the job, allowing one to edit an audio, clean up noise and transcribe the result. Given the commercial potential of TTS (as evidenced by the plethora of websites offering TTS for $), I think it would be a good investment for Steinberg. I havenât written any code since machine language so I donât know the magnitude of the job. But the code is already written. It just needs a user-friendly interface.
The concept of number or developers and development team has completely blurred in recent years. There are strategic alliances per project and even per tasks, between teams and between collaborators. Specific licenses. You may set a discord, a Github, a slack channel or other platform and collaborations may mount depending on what is requested and uploaded.
For instance, Monsieur Lobel himself set years ago an open community effort called Pytorch with tools and frameworks to build libraries and support AI. One of its endeavors is this PyTorch Edge | PyTorch
This is just one example, together with his previous/ongoing? scientific research and lets not forget the alliances/agreements he first set with Sony Creative more than a decade ago, next with Magix and followed with Steinberg, all constitue a complex web of inputs for SpectraLayers development.
Therefore, saying âone developerâ does not represent what really there is behind this development.
Yeah I agree, but features like âA.I. Voice cloningâ is extremely specific (and overall outside of spectral editing). Itâs not just frameworks and toolkits and libraries, youâre talking about a whole area of expertise of Machine learning/deep learning and neural networks. Then on top of that, new research needs to be done (for example a lot of the voice A.I. clone songs that I hear with celebrity voices have a common noticeably problem with voice articulation. The vowels and consonants sound accurate but the voice articulation and how that individual pronouces each individual syllable is way off). Then again, something like A.I. voice cloning can become a problem where people will abuse it (not to mention all the proposed regulations surrounding A.I.)
True, I dont know everything on the development side but Iâm pretty sure that itâs going to take more than an alliance or libraries or github code in order to implement something like voice A.I. cloning.
I think @Gregory_McCollum has made a fair request and an interesting topic. Iâd be keen to read the developers thoughts on the matter.
Crucially, the video posted shows TTS to be perfectly possible right now, using several open-source tools. It was quite an insight. The voice model âtrainingâ part seemed pretty âfriction-freeâ.
The task to remove background noise in this case, could be done quite separately and wonât always be needed. Though, that is one of SLâs specialities of courseâŚ
But I can imagine it is quite a deal of effort to build similar (to open-source) TTS capability right inside a commercial app sold on the open market.
BTW - I have no expert/programming/software knowledge either way; am just another end user.
There are a lot of AI features considered for the next version. TTS is indeed one of them.
Itâs hard to say what features will make it into SL11 at this point, but this one will certainly be considered during development.
@unmixing saidâŚ
A.I. Voice cloning* . I would like to see voice features like voice cloning and would like to see more features that uses A.I. to do various things with the voice and would like to see a feature that would allow you to type a sentence and have A.I. sing back what you typed(in Mariah CareyâsâŚ
WELLâŚ
Iâm not Yamaha, not Steinberg, not Mariah, but I am historically plaintiff in similar territory lawsuits and Iâll offer my opinion.
In the US, via case law/staute over the past few decades, itâs been the case that decisions have come down that one canâtâŚfor exampleâŚcopyright a snare drum hit. etc.
Thereâve been cases that have pronouncedâŚyou canât copyright your voice.
HoweverâŚand this would I think be very important to a commercial software product maker or a bunch of github guys or similar (who I believe are being sued)âŚit IS possible for a celebrity with a distinctive voice to trademark their voiceâŚwhich can carry very sharp teeth in a court of lawâŚin the US.
NowâŚif Mariah Carey does indeed have a registered trademarkâŚandâŚ
JimsFantasticCelebrityAiVoiceClone software goes on sale for $39.95 somewhere on the planetâŚand contains a Mariah presetâŚor a preset cleverly named Moriah or MarAI and it sounds pretty much like the real Mariah after 16yr old BobGarage types in text to the programâŚcreates a song that sounds like Mariah singing and puts the file up on youtube for freeâŚ
IF there is a trademarkâŚguess who the LA attorneys are gonna go after? The maker of the software. The enabler.
MeâŚI wouldnât want any remote prospect of being involved with creating the product.
Iâm pretty sure itâs this type of thing that buried csp back at the turn of the century although Iâve never been sure.
All that being said, the cork is out of the AI bottle so to speak. I just wouldnât want to be the one nailed with a lawsuit.
On-the-other-hand, you could technically add a âmatch voiceâ process (with a fairly simple input and output process) and that shouldnât violate any copyrights nor trademarks. For example if there was a fairly simple process to input 20 Mariah Carey acapellas and have an A.I. process round up all 20 acapellas and match it to another recorded vocal (like the idea of âmatch EQâ and the concept of âde-bleed processâ where you input your sources and it outputs the results) (where it matches the timbre, the tone, and voice profile to another voice) it shouldnât violate any copyright nor trademarks. It wouldnât work(in terms of completely cloning another voice) because there are lead and background vocals within acapellas and the sum of those 20 acapellas would round up all the background vocals along with the lead vocals and the timbre/tone would sound artificial. So technically it would be very difficult to abuse.
I gotta reiterateâŚgo look at the current lawsuits.
Last I looked, there were three or four but today, I notice itâs spiraled into dozens.
Anyone running a recorded celebrity voice through software in an attempt to recreate an ai matchâŚis fair game for a lawsuitâŚincludingâŚthe fact that one âusesâ a recording of letâs say the Beach Boys⌠in order to âanalyzeââŚdemix for analysisâŚetcâŚ
suddenly, youâre screwed on a bunch of legal fronts before you get to the ai analysisâŚspecificallyâŚyou grabbed a specific"recording" to begin the analysisâŚguess what?
The ârecordingâ is covered under SR copyright registration. Youâre not allowed to source it. Busted!
NowâŚletâs say one uses a field audio recorder to manually record BeyonceâŚor Taylor SwiftâŚas theyâre standing in an airport, talking on their cellphoneâŚand you manage to get a clean recording of Taylor Swift speaking the words âhey, Iâm in DallasââŚand you go back to your cave and run that through software to somehowâŚsomehowâŚcreate a âTaylor Swiftâ preset that has turned her speaking voice into pitches, assigned to every word in the english language, complete with nuancesâŚjust from her speaking âhey, Iâm in Dallasâ.
HmmmâŚthat may workâŚâsee judge, I made my ai from my own recording of Taylor Swift, talking at the airport on her cellphoneâ
Which isnât covered by an sr copyright.
Judge says âok, what was the date you made the recording?â
You say the date.
BamâŚTaylorâs massive lawyers hit you with âinvasion of privacyââŚitâs illegal to record someone on the phone
Letâs go backâŚletâs say you lie and say you didnât source the ai from a recording of the celebrityâŚguess what?
Youâll have to âproveâ you didnât
And youâll lose that one
Sounds like all this is getting out there in far-fetched land?
Believe me, these things can go on until they bleed you dry financially.
Rights-holders can get very very very angry and legally vindictive.
Go onâŚcheck out the mushrooming ai casesâŚriaa, abkco, universal music groupâŚon and on and on!!! Theyâre suing outfits left and right. Even I was surprised how many new cases there are.
This is gonna be wayyyyy bigger than the Napster fiasco!!
I would not dare touch this voice-clone stuff as a software developer!
excellent.
You are certainly correct about the explosion of lawsuits around generative AI. However, I would be interested if you know of any lawsuits filed against a software developer for the task the software was designed to accomplish. Just wondering.
It doesnât necessary have to specifically be âvoice cloningâ (per se), however a feature like âtonal matchâ or âtimbre matchâ process (where you can take any source and morph the characteristics of it into another). For example, in Serum you can morph (both spectrally and on a wave dsp level) two waveforms into each other in real-time. It would be interesting to take a voice of someone singing and morph it into the characteristics of a sawtooth wave synthesizer(the timbre/tonal aspects).
The bigger picture is a feature that can not only morph voices into another but can morph anything into anything (for example a voice into synthesizer or vice-versa a synthesizer into voice). The casting/molding feature is an excellent example of a concept like this being brough to life.
Voice cloning(although wouldnât necessarily be the bigger picture) is just one aspect of it. Imagine the ability to take someoneâs voice and morph the characteristics of it into a synthesizer.
@unmixing I donât have an opinion on what happens if you yourself capture a recording of Mariah Careyâs âHeroâ, demix it and turn the main vocal into a morphed Hammond B3 insteadâŚor tuba.
If you get a worldwide hit, do you intend to do interviews blabbering "yeah, I made that tuba from a Mariah Carey vocal?
My red flags were from your proposalâŚ
âwould like to see a feature that would allow you to type a sentence and have A.I. sing back what you typed(in Mariah Careyâs voice), but for that to become a reality Steinberg would have to make a decision and decide if those are features they want to invest inâ
I understand, the reason why youâre thinking red flag is because youâre only looking at from one perspective. Where as Iâm looking at it from the possibility of sound design. For example there are dozens of videos on youtube demonstrating the feature of casting and molding and some youtubers demonstrate that feature to do sound design.
Plus (like I said), the already demonstrated A.I. vocal songs that are on youtube have numerous artifacts and people can tell that those vocals are artificial (mainly because itâs combining lead vocals along with background vocals into the algorithms).
Try otter.ai. Itâs dedicated text to speech and the best Iâve found. But it still makes plenty of mistakes. I vote for Spectralayer to continue their development path. Right now Iâd rate the product as promising, probably enough for me to buy a license. I di a lot of voice cleanups and am particularly interested in unmixing.
Tortoise is good also.
Thank you. Always looking for new apps.