Half Life Scientist

Create

Half Life Scientist

EnglishFictionalRVC V2
lusbert_ user image
lusbert_
1 year ago
👀

474

👍

11

🪄

56

Description

14 batch size RMVPE 3 minute dataset ripped from Half Life 1 40k pretrain < the infers are vocals ripped from Half Life 2 the ripped files were 11kHz so i had to upscale using RX Spectral Recovery cleaned using Envelope and manually deleting artifacts

Comments

resizable user image
resizable
1 year ago

another 1000 epoch model

lusbert_ user image
lusbert_
1 year ago

well 1000 epoch as i said is the better one

lusbert_ user image
lusbert_
1 year ago

overtrainning doesnt exist :3

resizable user image
resizable
1 year ago

why do you think that

resizable user image
resizable
1 year ago

i mean you said why but im really interested in this theory

lusbert_ user image
lusbert_
1 year ago

because i have proof and Raven said so and so as FDG and Felt all three smart people

lusbert_ user image
lusbert_
1 year ago

not a theory its a fact they tested it

resizable user image
resizable
1 year ago

i'll test it too

lusbert_ user image
lusbert_
1 year ago

sure

lusbert_ user image
lusbert_
1 year ago

for now theres a new idea that 48K is better if the dataset is good enough and if not it just boosts ringing and noise

lusbert_ user image
lusbert_
1 year ago

but well its not done training

resizable user image
resizable
1 year ago

so this only works with good datatasets, correct?

lusbert_ user image
lusbert_
1 year ago

yes

lusbert_ user image
lusbert_
1 year ago

like no noise practically

lusbert_ user image
lusbert_
1 year ago

i spent 3 hours on the 3 minute dataset for this model

resizable user image
resizable
1 year ago

oh

resizable user image
resizable
1 year ago

I feel like there's no difference between 40k and 48k tbh

lusbert_ user image
lusbert_
1 year ago

there is the pretrains are different

resizable user image
resizable
1 year ago

Well audio wise

resizable user image
resizable
1 year ago

Like if I render audio in 40k, it will sound no different

resizable user image
resizable
1 year ago

Also good model once again. 10/10

lusbert_ user image
lusbert_
1 year ago

well there is not audible difference between 48k and 40k yea 48k being better is basically placebo

resizable user image
resizable
1 year ago

also is 40k faster for training?

SimplCup user image
SimplCup
1 year ago

48khz has better quality, but it works only with the same high quality datasets. Like if for example i will take this audio and put it in 48khz training, the model will come out glitchy and robotic especially with sibilants and breathing, because it will try to guess the frequencies that don't exist in dataset, but if i put the same dataset in 40khz training then it'll be normal, without any glitches and weird artifacts in sounds. with my testing i came to conclusion that: it's better to use 32khz training if your dataset is lower than that or slightly higher (20khz-34khz), it's better to use 40khz training for 34khz-42khz and 42khz-50khz for 48khz training.

lusbert_ user image
lusbert_
1 year ago

riight so its not a matter of the training datasets for the 40k and 48k pretrains

lusbert_ user image
lusbert_
1 year ago

thanks for your information <:matsuripray:1159685390156967936>

SimplCup user image
SimplCup
1 year ago

no problem, and also talking about overtraining, overtraining is kinda almost not real, but it is real, i noticed that some of my models start to have bad robotic breathing and hard time pronouncing S, Ch, Sh sounds without glitches with higher amount of epochs, but they sounded completely fine on lower amount of epochs, so i guess here's the answer to the statement that overtraining doesn't exist, it exists but only for sibilants and breathing, which some people don't even notice sometimes.

lusbert_ user image
lusbert_
1 year ago

i mean yes overtrainning ofcourse is real since RVC is just GAN but i meant that for most cases it doesnt even exist practically considering it makes the voice model even better (If the dataset is good enough of course) but for my case overtrainning wont exist since the audios are already clean and high quality and as for the sibilants and breathing you can easily remove them with post processing in RX and such so in my opinion its better to just use the longer trained model and then remove artifacts if needed

Big Mitch user image
Big Mitch
1 year ago

3 Minute dataset? There's like 25 minutes of data for the scientist counting the expansions

lusbert_ user image
lusbert_
1 year ago

quality over quantity

lusbert_ user image
lusbert_
1 year ago

i spent like 5 hours cleaning it

lusbert_ user image
lusbert_
1 year ago

prob more idfk

Add a comment

Samples

New
Classic
1. Singing
Male
English
2. Singing
Female
English
3. Singing (Dry)
Female
English
4. Singing (High)
Female
English
5. Singing 2
Male
English
6. Singing (Dry)
Male
English
7. Singing (Dry, High)
Male
English

Pitch

Selected Audio
Selected Audio