Description
14 batch size RMVPE 3 minute dataset ripped from Half Life 1 40k pretrain < the infers are vocals ripped from Half Life 2 the ripped files were 11kHz so i had to upscale using RX Spectral Recovery cleaned using Envelope and manually deleting artifacts
Comments

another 1000 epoch model

well 1000 epoch as i said is the better one


overtrainning doesnt exist :3

why do you think that

i mean you said why but im really interested in this theory

because i have proof and Raven said so and so as FDG and Felt all three smart people

not a theory its a fact they tested it

i'll test it too

sure

for now theres a new idea that 48K is better if the dataset is good enough and if not it just boosts ringing and noise

but well its not done training

so this only works with good datatasets, correct?

yes

like no noise practically

i spent 3 hours on the 3 minute dataset for this model

oh

I feel like there's no difference between 40k and 48k tbh

there is the pretrains are different

Well audio wise

Like if I render audio in 40k, it will sound no different

Also good model once again. 10/10

well there is not audible difference between 48k and 40k yea 48k being better is basically placebo

also is 40k faster for training?

48khz has better quality, but it works only with the same high quality datasets. Like if for example i will take this audio and put it in 48khz training, the model will come out glitchy and robotic especially with sibilants and breathing, because it will try to guess the frequencies that don't exist in dataset, but if i put the same dataset in 40khz training then it'll be normal, without any glitches and weird artifacts in sounds. with my testing i came to conclusion that: it's better to use 32khz training if your dataset is lower than that or slightly higher (20khz-34khz), it's better to use 40khz training for 34khz-42khz and 42khz-50khz for 48khz training.

riight so its not a matter of the training datasets for the 40k and 48k pretrains

thanks for your information <:matsuripray:1159685390156967936>

no problem, and also talking about overtraining, overtraining is kinda almost not real, but it is real, i noticed that some of my models start to have bad robotic breathing and hard time pronouncing S, Ch, Sh sounds without glitches with higher amount of epochs, but they sounded completely fine on lower amount of epochs, so i guess here's the answer to the statement that overtraining doesn't exist, it exists but only for sibilants and breathing, which some people don't even notice sometimes.

i mean yes overtrainning ofcourse is real since RVC is just GAN but i meant that for most cases it doesnt even exist practically considering it makes the voice model even better (If the dataset is good enough of course) but for my case overtrainning wont exist since the audios are already clean and high quality and as for the sibilants and breathing you can easily remove them with post processing in RX and such so in my opinion its better to just use the longer trained model and then remove artifacts if needed
3 Minute dataset? There's like 25 minutes of data for the scientist counting the expansions

quality over quantity


i spent like 5 hours cleaning it

prob more idfk
Add a comment
Samples
Pitch
More models by
lusbert_More to explore
Saiba Momoi (Blue Archive)

Ariana Grande AI

JENNIE of BLACKPINK [Strong Ver.]

Saiba Momoi (Blue Archive) (VA: Tokui Sora)

Hatsune Miku
SpongeBob SquarePants (Talking And Singing)
Takanashi Hoshino (from Blue Archive)

Satoru Gojo (JJK) [VA Yuichi Nakamura]

ENHYPEN Heeseung

Sunaokami Shiroko (Blue Archive)

Villager (Minecraft)

Mortis [Brawl stars]
Jungkook (BTS)

Tendou Arisu (Blue Archive)

Kanye West
Loading more