Create

Half Life Scientist

EnglishFictionalRVC V2

lusbert_

1 year ago

👀

474

👍

🪄

Description

14 batch size RMVPE 3 minute dataset ripped from Half Life 1 40k pretrain < the infers are vocals ripped from Half Life 2 the ripped files were 11kHz so i had to upscale using RX Spectral Recovery cleaned using Envelope and manually deleting artifacts

Comments

resizable

1 year ago

another 1000 epoch model

lusbert_

1 year ago

well 1000 epoch as i said is the better one

lusbert_

1 year ago

https://cdn.discordapp.com/emojis/669514488374099969.gif?size=48&name=HibikiShrug%7E2&quality=lossless

lusbert_

1 year ago

overtrainning doesnt exist :3

lusbert_

1 year ago

https://cdn.discordapp.com/attachments/1159265300915687534/1164515857951629342/801z12.jpg?ex=65437ee8&is=653109e8&hm=3659acfd8c9a9681fa4b9115f6fde7e9ece93107ab75f73cbbb92718119d724a&

resizable

1 year ago

why do you think that

resizable

1 year ago

i mean you said why but im really interested in this theory

lusbert_

1 year ago

because i have proof and Raven said so and so as FDG and Felt all three smart people

lusbert_

1 year ago

not a theory its a fact they tested it

resizable

1 year ago

i'll test it too

lusbert_

1 year ago

sure

lusbert_

1 year ago

for now theres a new idea that 48K is better if the dataset is good enough and if not it just boosts ringing and noise

lusbert_

1 year ago

but well its not done training

resizable

1 year ago

so this only works with good datatasets, correct?

lusbert_

1 year ago

yes

lusbert_

1 year ago

like no noise practically

lusbert_

1 year ago

i spent 3 hours on the 3 minute dataset for this model

resizable

1 year ago

resizable

1 year ago

I feel like there's no difference between 40k and 48k tbh

lusbert_

1 year ago

there is the pretrains are different

resizable

1 year ago

Well audio wise

resizable

1 year ago

Like if I render audio in 40k, it will sound no different

resizable

1 year ago

Also good model once again. 10/10

lusbert_

1 year ago

well there is not audible difference between 48k and 40k yea 48k being better is basically placebo

resizable

1 year ago

also is 40k faster for training?

SimplCup

1 year ago

48khz has better quality, but it works only with the same high quality datasets. Like if for example i will take this audio and put it in 48khz training, the model will come out glitchy and robotic especially with sibilants and breathing, because it will try to guess the frequencies that don't exist in dataset, but if i put the same dataset in 40khz training then it'll be normal, without any glitches and weird artifacts in sounds. with my testing i came to conclusion that: it's better to use 32khz training if your dataset is lower than that or slightly higher (20khz-34khz), it's better to use 40khz training for 34khz-42khz and 42khz-50khz for 48khz training.

lusbert_

1 year ago

riight so its not a matter of the training datasets for the 40k and 48k pretrains

lusbert_

1 year ago

thanks for your information <:matsuripray:1159685390156967936>

SimplCup

1 year ago

no problem, and also talking about overtraining, overtraining is kinda almost not real, but it is real, i noticed that some of my models start to have bad robotic breathing and hard time pronouncing S, Ch, Sh sounds without glitches with higher amount of epochs, but they sounded completely fine on lower amount of epochs, so i guess here's the answer to the statement that overtraining doesn't exist, it exists but only for sibilants and breathing, which some people don't even notice sometimes.

lusbert_

1 year ago

i mean yes overtrainning ofcourse is real since RVC is just GAN but i meant that for most cases it doesnt even exist practically considering it makes the voice model even better (If the dataset is good enough of course) but for my case overtrainning wont exist since the audios are already clean and high quality and as for the sibilants and breathing you can easily remove them with post processing in RX and such so in my opinion its better to just use the longer trained model and then remove artifacts if needed