
TF2 Spy | Dennis Bateman_e16-GPT_e15-SoVITS

168
2
1
Description
TLDR the SoVITS is 15e with GPT being 16e GPT is DPO trained < notes: (for people who wanna learn more stuff about my experience with GPT-SoVITS) dataset size is 21 minutes batch size 2 on GPT and SoVITS SoVITS Learning Rate 0.4 15 epochs sovits took around 40 minutes 16 epochs gpt 27 minutes with DPO saving freq 2 at first i trained GPT to 20 epochs but last epoch saved is e18 prob cause my saving freq was 3 so ye be sure to look out ofr that as well as for SoVITS model it was fine most likely since i trained to 15 epochs and not 20 so prob smth to do with if the modulo of epoch is 0 or not so like [epoch count]mod[saving freq] with mod being % in python so in my case where i trained to 20 epochs with saving freq of 3 20%3 which is 2 aka non 0, so thats BAD :3 so just make sure the mod is always 0 so that you train to the full epoch count i didnt like the model when i first trained without DPO so i retrained GPT model with DPO it also went from using 6GB VRAM to using 8GB this time it took 35 minutes also apparently in SOME cases the default pretrained GPT model is better, often where new words are said or the punctuation is different i'd say? but im not really sure
Comments

wall texting in question

it actually got the accent pretty well ngl
yap
joke
but still yap

Bro you don’t have to write a fking article

also in when it says "Two, I shall" it some how connects the two words

idfc i was going to explain my experience i put a TLDR and those who care can read the rest

since if someone wanted to do this on their own they'd be able to benefit from a good explanation
he isnt wrong about SD3 never going to come out

true

spy only speaks facts

for SD 3 i had to say `ess dee three` instead
its been "so long" since any updates

eh like 3 days of radio silence and then a new sudden update

but fyi the researchers who started Stable Diffusion at Stability AI resigned so expect slower development
Ik its "Forever" for the people who expect them to release it immediately

well they are teasing the people either way so there's more impatience on top of that
which sucks kinda

yep
by the time its out I bet RVC V3 would be out for a week or 2

nah i expect RVC to be fully abandoned by then cause RVC v3 pretrains didnt make much improvements either way

and RVC is slowly dying
Oh man that sucks

hence the server is dying with it same for help chats

ye but well people move from one thing to another thing when they get bored

me when only 20 models every day while old AI hub had like 400 every day or even more? idfk i cant remember
I feel like sovits models are easier and require less effort to train you dont need to clean the audio as nearly as much
I meant GPT SoVITS

oh ye

thats for sure its easier to train as well

like in terms of being faster
much faster
I wonder if its possible to fix the GPT part

wdym
Make it quicker to train

GPT is already super fast..

well it depends on the dataset size and if DPO or not but still
How do you disable DPO

in my tests it was always faster than SoVITS training

its a check mark in the webui by default its disabled

RVC is dying? shit

welp gotta wait for someone to make a successor to it
Oh ok

GPT-SoVITS imo but well its TTS but well for me TTS is just easier and better and more useful

it doesnt sound that great to me. not as versatile as RVC
It sucks but I think we should still make RVC models

i mean dont get me wrong it's better than tacotron

here "enable DPO" https://i.postimg.cc/8zvTJBcV/image.png
Oh yeah that is disabled

ye by default its that

i went from talknet, to sovits, to rvc v1, to v2

it gets the VRAM usage higher by like 2GB iirc

idfk where it'll go next

give it some time i'm sure a new AI will make RVC v2 look bad in comparison lol

RVC is more like for singing tbh GPT-SoVITS has higher potential i think

how so
All that retraining everyone's gonna have to do

how is it better than a singing AI

its just faster and also does better with accent

no no i mean speaking

TTS speaking on GPT-SoVITS and RVC for singing
And inflection and porosity

for my use cases TTS is better

lemme google

oh reasonable
Yeah

imo GPT-SoVITS with RVC can be even better honeslty

i should try it one time but too lazy rn
Oh yeah that works very well i tried it and it helped more

ye i'd expect it so cause like GPT-SoVITS can carry the accent and RVC can carry the "voice" of it
GPT Sovits>Elevenlabs for sure

ya

I think Elevenlabs is still more realistic but GPT Sovits is more versatile
Yeah its more realistic but GPT SoVITS is free and open source and you can fine tune

ElevenLabs sometimes doesn't accurately get the voice right when you ai cloned the voice and its higher qauilty. GPT SoVITS can create the most accurate voices through TTS surpassing Tacotron2 and other TTS Ai software.
I tried doing Mario's voice through it with 90 percent style exaggeration and it doesn't sound as good as SoVITS

What I heard from my friend, Mario was the most difficult to make a accurate voice using Ai software and SoVITS pretty much somewhat solve it because his pitch/tone goes up and down and most voices go closer to their pitch range.

I don't think RVC will die per say imo but it is slowly fading away in the models section but its still popular regardless. I think its especially useful as a voice changer and a singing tool.
RVC is still good for covers and somewhat voice conversion

and even for other sounds and sfxs like drums
Oh yeah that too
I'm probably still going to make RVC models

I'm still gonna make them because after all I still have a huge list of models I got to train. I just wish I get a new GPU or a way to train it using my computer's CPU.

because I can able to run RVC GUI with RMVPE using my CPU (thanks to my friend modding it) and RVC Realtime using my GPU or CPU I think and I had not bump into any problems at all. I only have a old GPU from 2017 that runs up to 2GB or around that of VRAM.

who actually uses TT2

Thats one of the worst tts voice cloning i've ever seen

Uberduck, Fakeyou and the ai streams really

but they don't know that GPT SoVITS exists

at least I find the strokes funny

echelon knows

i used to make models for uberduck

good times
Did you use tacotron?

yep
I bet that had to get lots of data

i've made pretty decent models off of less than 30 seconds
like a ton and probably took forever to train
Oh wow I didnt think tacotron could do that I thought you have to have alot of data

having a lotta data is good too

havent touched tacotron in quite a while
I mean hours and hours

eh like 3 hours usually

or less
Oh that isnt as bad as I thought

I've attempt to make my first TT2 model through uberduck but that went so bad but I did tried again but failed this time. Now my friend makes the TT2 models for me instead.

i have yet to train with GPT Sovits but i'll def give it a go in the near future

but we made them for a collaborative ai stream project and so far its going pretty well.

and yes it would take over 9-12 hours to train each of these models which he locally trained on his Linux computer or his main idk

does weights.gg not have GPT sovits support yet?

idk

no

yeah ig i'll wait until it does to make GPT Sovits models
Add a comment
Samples
This model failed processing - generated sample are not available
More to explore
Saiba Momoi (Blue Archive)

Ariana Grande AI

JENNIE of BLACKPINK [Strong Ver.]

Saiba Momoi (Blue Archive) (VA: Tokui Sora)

Hatsune Miku
SpongeBob SquarePants (Talking And Singing)
Takanashi Hoshino (from Blue Archive)

Satoru Gojo (JJK) [VA Yuichi Nakamura]

ENHYPEN Heeseung

Sunaokami Shiroko (Blue Archive)

Villager (Minecraft)

Mortis [Brawl stars]
Jungkook (BTS)

Tendou Arisu (Blue Archive)

Kanye West
Loading more