sleepygirl 32k snowieV3

Create

sleepygirl 32k snowieV3

RVC
Sheila fredes user image
Sheila fredes
1 year ago
👀

430

👍

4

🪄

101

Description

hi 9min dataset samplerate 32k 200 epochs snowiev3 pretrain rmvpe no index cause i never use that honestly

Comments

mrm0dz user image
mrm0dz
1 year ago

egirl voice inflation

mares_ user image
mares_
1 year ago

I'm not gonna lie blyuv, your last 3 models have sounded like the exact same person lol

Sheila fredes user image
Sheila fredes
1 year ago

🤣 🤣 🤣

mrm0dz user image
mrm0dz
1 year ago

sounds like merged voices to me

Sheila fredes user image
Sheila fredes
1 year ago

yeahhh

mrm0dz user image
mrm0dz
1 year ago

i have so many female voices at this point

mrm0dz user image
mrm0dz
1 year ago

i can just merge them and make infinite

mrm0dz user image
mrm0dz
1 year ago

i just dont post them bc people dont credit anymore

mrm0dz user image
mrm0dz
1 year ago

or try reselling them

Sheila fredes user image
Sheila fredes
1 year ago

i merge different ones, idk why they sound the same at the end

Sheila fredes user image
Sheila fredes
1 year ago

but i merge and then train them at rvc

mrm0dz user image
mrm0dz
1 year ago

if you merge similar voices will sound kinda the same

Sheila fredes user image
Sheila fredes
1 year ago

rvc has too many limits honestly i think is cause of that

mares_ user image
mares_
1 year ago

if you don't have a merge folder with hundreds of different merges are you really a realtime enjoyer

mares_ user image
mares_
1 year ago

let's be honest here

Sheila fredes user image
Sheila fredes
1 year ago

this is a 3 merged models i made and then remake them on rvc again with a inference and new dataset

Sheila fredes user image
Sheila fredes
1 year ago

but i think the rvc has too many pitch tones limites that they sound the same

mares_ user image
mares_
1 year ago

ah that's why

mares_ user image
mares_
1 year ago

you should just post the merge

mares_ user image
mares_
1 year ago

instead of making a new model of the inference of the merge

mares_ user image
mares_
1 year ago

cuz it sounds worse that way

Sheila fredes user image
Sheila fredes
1 year ago

yeah i know

Sheila fredes user image
Sheila fredes
1 year ago

but im not giving like good models or mergeds xD

Sheila fredes user image
Sheila fredes
1 year ago

so i just post what i recycle

Sheila fredes user image
Sheila fredes
1 year ago

like this one

mares_ user image
mares_
1 year ago

ah, fair enough

Sheila fredes user image
Sheila fredes
1 year ago

i heard the gonna release a new " good " pretrain

mares_ user image
mares_
1 year ago

But like, what kinda voices are you merging. Like 3 egirl models together, or are you merging different kinda voices?

Sheila fredes user image
Sheila fredes
1 year ago

this one is 3 differents vtubers

mares_ user image
mares_
1 year ago

yeahhhh

mares_ user image
mares_
1 year ago

that explains it

Sheila fredes user image
Sheila fredes
1 year ago

it sounds good on singing at least owo

mares_ user image
mares_
1 year ago

merging gives the best dividends when you're merging voices that are pretty distinct from one another, but then again I am assuming what kinda vtubers you're using as data xD

Sheila fredes user image
Sheila fredes
1 year ago

haha to be honest i just pick random good quality vtuber voices

Sheila fredes user image
Sheila fredes
1 year ago

dont even know the names i just listen to the sound of the voices lol

mares_ user image
mares_
1 year ago

lmao

mares_ user image
mares_
1 year ago

based

Sheila fredes user image
Sheila fredes
1 year ago

and pick 3 or 4 make dataset make model and merge them

Sheila fredes user image
Sheila fredes
1 year ago

the lazy part is just cleaning the audio

mares_ user image
mares_
1 year ago

yeah it's pretty tedious

mares_ user image
mares_
1 year ago

Titan, but it's still just a general pretrain so it's not going to be huge or anything. ~~~~I am finetuning a pretrain for English female voices in particular~~

mares_ user image
mares_
1 year ago

like idk how it's going to turn out, but my hope is that it gives better results for choosing a specialty for that data.

Sheila fredes user image
Sheila fredes
1 year ago

rvc makes me feel so fustrated ;-; i hope they release a new tech soon

mares_ user image
mares_
1 year ago

what's wrong with it

Sheila fredes user image
Sheila fredes
1 year ago

i mean its good but once you get used to it, you realized it has a lot of limitations

mrm0dz user image
mrm0dz
1 year ago

play around with diff-svc

Sheila fredes user image
Sheila fredes
1 year ago

and not good quality parts

mrm0dz user image
mrm0dz
1 year ago

way better for singing

mrm0dz user image
mrm0dz
1 year ago

and overall vocal range

Sheila fredes user image
Sheila fredes
1 year ago

tbh i just use it for realtime

Sheila fredes user image
Sheila fredes
1 year ago

i dont care about the singing or inference lol

mrm0dz user image
mrm0dz
1 year ago

realtime has its limitations

mrm0dz user image
mrm0dz
1 year ago

since rvc models are trained on voice

mrm0dz user image
mrm0dz
1 year ago

any non voice sounds will fuck up

mrm0dz user image
mrm0dz
1 year ago

like coughing, laughing, sneeze and more

mares_ user image
mares_
1 year ago

what limitations are you talking about in terms of realtime?

mrm0dz user image
mrm0dz
1 year ago

prob those i mentioned

Sheila fredes user image
Sheila fredes
1 year ago

the pitch and tone is not the same as the dataset

mrm0dz user image
mrm0dz
1 year ago

also tone change

mrm0dz user image
mrm0dz
1 year ago

like whispering

mares_ user image
mares_
1 year ago

Well I can manage laughing with realtime, but not like full on belly laughs

mares_ user image
mares_
1 year ago

more like giggles

Sheila fredes user image
Sheila fredes
1 year ago

when you train a model it doesnt detected all the sound of the voice even if you boost all the dynamics

mrm0dz user image
mrm0dz
1 year ago

dont expect the model to translate emotions if you talk act girly

mrm0dz user image
mrm0dz
1 year ago

some guys use it talking like a boy

mrm0dz user image
mrm0dz
1 year ago

it becomes just a guy with a girl voice

Sheila fredes user image
Sheila fredes
1 year ago

i know, im not saying is bad but is not the same as dataset

Sheila fredes user image
Sheila fredes
1 year ago

its never the same as dataset

mrm0dz user image
mrm0dz
1 year ago

of course

mrm0dz user image
mrm0dz
1 year ago

it wont ever be 1:1

mares_ user image
mares_
1 year ago

yeah, that is a problem

Sheila fredes user image
Sheila fredes
1 year ago

yeah thats why i say it has limitations

mares_ user image
mares_
1 year ago

we call that... a tomboy

mrm0dz user image
mrm0dz
1 year ago

i wouldnt say its a technology limitation

mrm0dz user image
mrm0dz
1 year ago

it does what is supposed to do

mrm0dz user image
mrm0dz
1 year ago

it doesnt convert a TTS to a human voice

mrm0dz user image
mrm0dz
1 year ago

nor someone with no emotion to something with emotion

mares_ user image
mares_
1 year ago

But like, I only use merges anyways so I don't really care about it lining up with the source audio 100%

Sheila fredes user image
Sheila fredes
1 year ago

yeah im not talking about the emotion part

mares_ user image
mares_
1 year ago

as long as it sounds like I want it to after the fact

Sheila fredes user image
Sheila fredes
1 year ago

but yes the pitch tone and quality

Sheila fredes user image
Sheila fredes
1 year ago

its never the same as the dataset

Sheila fredes user image
Sheila fredes
1 year ago

like if you clone your own voice

Sheila fredes user image
Sheila fredes
1 year ago

it wont sound the same

Sheila fredes user image
Sheila fredes
1 year ago

xD

Sheila fredes user image
Sheila fredes
1 year ago

similar and accurate yes

Sheila fredes user image
Sheila fredes
1 year ago

but not the same

mares_ user image
mares_
1 year ago

rcv gives you its best guess after you train

mares_ user image
mares_
1 year ago

basically lol

mares_ user image
mares_
1 year ago

and to get back to this, the solution is not to act like something you're not, but to sculpt a voice towards your particular speaking style 🤔

mares_ user image
mares_
1 year ago

by merging different types of speakers

mares_ user image
mares_
1 year ago

into a ratio that just vibes with your speech patterns

mares_ user image
mares_
1 year ago

~~unless you think every woman is a bubbly valley girl lmao~~

Sheila fredes user image
Sheila fredes
1 year ago

yeah i mean all im saying is even if you clone your own voice it wont sound the same, not the talking part and the voice acting, i say the pitch and tone it wouldnever be the same, that is the limitation im refering too

mares_ user image
mares_
1 year ago

~~ I was talking to MrModz~~

mares_ user image
mares_
1 year ago

I get what'cha mean.

Sheila fredes user image
Sheila fredes
1 year ago

101 comments damn XD

razeristaken user image
razeristaken
1 year ago

You should try training all the datasets together instead of merging

razeristaken user image
razeristaken
1 year ago

gives better results imo

mares_ user image
mares_
1 year ago

it'd be super tedious to test different ratios of voices that way

razeristaken user image
razeristaken
1 year ago

yea

razeristaken user image
razeristaken
1 year ago

but better sounding results

mares_ user image
mares_
1 year ago

example?

razeristaken user image
razeristaken
1 year ago

thats a model combining 3 voices in the training phase

Sheila fredes user image
Sheila fredes
1 year ago

i tried that too but idk why i get better results with merge maybe my voice or idk

Sheila fredes user image
Sheila fredes
1 year ago

on inference sounds really good , but on realtime is other story

razeristaken user image
razeristaken
1 year ago

well do the voices you combine sound similar?

Sheila fredes user image
Sheila fredes
1 year ago

i dont remember, but i know i did that experiment xD

mares_ user image
mares_
1 year ago

yeah I don't think it's better than merges 100%

mares_ user image
mares_
1 year ago

Like you can prob get similar results with both methods

mares_ user image
mares_
1 year ago

but it's like way more of a pain in the ass having to train a new model each time

razeristaken user image
razeristaken
1 year ago

i agree but just from what ive done training them has gotten me better results

mares_ user image
mares_
1 year ago

oh yeah blyuv, have you messed around with using plugins on your models to make them sound more realistic

Sheila fredes user image
Sheila fredes
1 year ago

like what plugins

mares_ user image
mares_
1 year ago

I see you in the voice chat here alot so I know you use real time quite a bit, but idk if you're interested in making it sound more like a real mic

Sheila fredes user image
Sheila fredes
1 year ago

i tried to use some vst

Sheila fredes user image
Sheila fredes
1 year ago

but idk why i prefer more virgin audio xD

mares_ user image
mares_
1 year ago

Equalization on the voice to quite the robotic parts of the voice, and make other parts louder, convulations on your model can imitate the sound of a bad mic more, and like bit crushers to make it sound less unnaturally clear

mares_ user image
mares_
1 year ago

yeah that's fair enough, I just use them on the model while I'm playing games or whatever

razeristaken user image
razeristaken
1 year ago

How would i go about doing this?

mares_ user image
mares_
1 year ago

makes it so no-one can tell its ai

mares_ user image
mares_
1 year ago

Well first you need something like Elgato wave-link or voice meter to apply plugins to your virtual cable

razeristaken user image
razeristaken
1 year ago

I have voice meter

Sheila fredes user image
Sheila fredes
1 year ago

also mares do you know how to change samplerate on wokada on client side?

Sheila fredes user image
Sheila fredes
1 year ago

i have my windows on 44khz my mic at 44khz and everything on 44khz but when i output a audio on okada its 48k

mares_ user image
mares_
1 year ago

idk, I don't use okada

mares_ user image
mares_
1 year ago

mine output to 44khz just fine on go_realtime_gui.bat

mares_ user image
mares_
1 year ago

Well you get all these plugins then

Hyperus18/RegalHyperus user image
Hyperus18/RegalHyperus
1 year ago

I usually save models from others under "Model (by Author)"

mares_ user image
mares_
1 year ago

and for the EQ what you really wanna do is basically soften the spectrum on the sides here, since a lot of the unnatural clearness on voices I have noticed comes from like, 50khz to 200khz.

razeristaken user image
razeristaken
1 year ago

do i need to install another thing thats not voice meter? bec there is no plugins menu that i can find. unless im dumb

mares_ user image
mares_
1 year ago

I think you need voice meter banana

razeristaken user image
razeristaken
1 year ago

i do have that

mares_ user image
mares_
1 year ago

But if you have an elgato mic just use wave link imo

mares_ user image
mares_
1 year ago

Yeah I don't use that, so idk. I just know other people have used plugins on it.

razeristaken user image
razeristaken
1 year ago

i dont

mares_ user image
mares_
1 year ago

But yeah here's an example of what it sounds like with all of those plugins on a voice.

mares_ user image
mares_
1 year ago

I usually don't have the bit crushing on it from cymatics origin, but I had that on too just for examples sake.

mares_ user image
mares_
1 year ago

Like most voice things compress your voice enough where you prob don't need the bitcrushing with a good voice

Leo_Frixi user image
Leo_Frixi
1 year ago

Tbh you should make some guides about how to clean datasets.

mares_ user image
mares_
1 year ago

Huh?

mares_ user image
mares_
1 year ago

I don't have any great secrets, I only train models on data that already has zero background noise lol

Leo_Frixi user image
Leo_Frixi
1 year ago

I mean, a updated guide about dataset cleanup and plugin usage for cleanup.

Leo_Frixi user image
Leo_Frixi
1 year ago

🐢 👍

Leo_Frixi user image
Leo_Frixi
1 year ago

teamongus

mares_ user image
mares_
1 year ago

And specifically these plugins were for real time, but you're right people could probably apply some of them to inferenced audio

Add a comment

Samples

New
Classic
1. Singing
Male
English
2. Singing
Female
English
3. Singing (Dry)
Female
English
4. Singing (High)
Female
English
5. Singing 2
Male
English
6. Singing (Dry)
Male
English
7. Singing (Dry, High)
Male
English

Pitch

Selected Audio
Selected Audio