Automate Voice Dataset Creation Using Whisper AI

Рет қаралды 1,500

Thorsten-Voice

Жүктеу

Пікірлер: 33

@CodeByCradle
3 ай бұрын
The video is exactly what I need :) Thank you so much!
@ThorstenMueller
3 ай бұрын
Happy to hear that, you're welcome 😊.
@bwheldale
3 ай бұрын
This is nice and saves a lot of time. Even so, while most audio segments are nicely split some are not. E.g., some may be cut off at the end. In such cases, it may be beneficial to implement a secondary process specifically designed to identify and eliminate those 'rejects' from the dataset. I'm finding it's a delicate balance of adjusting the parameters specific to a particular dataset but rejects may still occur and are best removed if possible for improved training.
@ThorstenMueller
2 ай бұрын
Thanks for your feedback 😊. Yes, it's a little bit try'n error to adjust parameters to find best way that sentences are split correctly. But in most cases it will require manual control and adjust afterwards. But it's way better than doing the whole process manually 🙃.
@bwheldale
2 ай бұрын
Thanks for explaining. It's definitely way better than manual. Also, I don't know if my thoughts on removal of rejects was wise as in cases those chunks may form part of a sentence. I'm still trying to understand the how and why.
@werneroi
3 ай бұрын
Thank you so much for the video! it helps so much to automate the process and saves lot's of time. As you are running on a mac... Do you have any video planned on how to use the dataset on a mac to create the voice as well? Or any updated tutroials how to use it in an updated google colab or lightning studio (this will be amazing, as google colab is a pain in the butt these days :). )
@ThorstenMueller
20 күн бұрын
Thanks for your comment 😊. Mostly i use linux to train a tts voice model on a voice dataset. Did not know about lightning studio but looks promising on a first look. Thanks for pointing out 👍.
@Wissens-Lounge
2 ай бұрын
Thx for sharing. Like
@hikmetemre6837
3 ай бұрын
Cheers that is a brilliant video! I have a question could I prepare a dataset for singing voice dataset too ?
@ThorstenMueller
2 ай бұрын
Thanks for your kind feedback 😊. A singing voice dataset is sounding like a interesting use case 👍. But i don't have any experience with that (yet).
@aneerpa8384
3 ай бұрын
Informative ❤
@DrFukuro
3 ай бұрын
Ein Folge-Video,was nun genau mit dem generierten Dataset gemacht werden kann und wie man vorgeht, wäre super. Falls es das schon irgendwo gibt, bitte verlinkten.
@ThorstenMueller
2 ай бұрын
Mit einem eigenen LJSpeech Sprachdatensatz kannst du deine Stimme klonen:. Entweder mit Coqui TTS oder (bevorzugt) mit Piper TTS. * Coqui: kzitem.info/news/bejne/lY-KnouQjZGpZ20 * Piper: kzitem.info/news/bejne/w5Wty5OgppNjZWU
@bwheldale
3 ай бұрын
Also, I'm curious about the most recently added "3rd column with cleaned/lowered text" what you have planned?
@ThorstenMueller
2 ай бұрын
According to original LJSpeech (keithito.com/LJ-Speech-Dataset/) dataset the 3rd column is "Normalized Transcription" and is required by some tts projects. Normally you would replace strings like "mr." to "mister" and "2" to "two". I just made it to lowercase and think on how i can integrate text cleaners that work for multiple languages.
@bwheldale
2 ай бұрын
Ahh, I see. I recall different datasets e.g., having in one case "2" and in another "two". Interresting, much appreciated.
@amaarboss2115
3 ай бұрын
really I love your surprised❤
@ThorstenMueller
3 ай бұрын
Thank you, since I am not a native English speaker, I am sometimes surprised at what I say 😅.
@VulcanOnWheels
2 ай бұрын
5:16 This size is good enough for me.
@ThorstenMueller
2 ай бұрын
Thanks for your feedback to the font size / scale 😊
@RaminAssadollahi
13 күн бұрын
So in principle, I can record German and English sentences since Whisper will recognise both. How does Piper handle two languages at once? Will it be able to learn German and English phonetics together?
@ThorstenMueller
10 күн бұрын
For whisper: yes For piper: imho this will not work perfectly right now. As german every day talk uses lots of english words switching phoneme language is important. But imho this does not work perfectly out of the box. Maye you preprocess the text before running tts.
@sonnyad
3 ай бұрын
Cool! Does it work With other languages than English?
@ThorstenMueller
3 ай бұрын
Yes, whisper automatically discovers the spoken language. It works for all languages supported by whisper. I tried it with german too and it really worked very well 😊.
@sonnyad
3 ай бұрын
@@ThorstenMueller ok thanks for the info
@BatoolKassem-i7d
Ай бұрын
Does it work on other than English recordings? Like Arabic for example?
@ThorstenMueller
27 күн бұрын
Hi, this should work for all languages that are supported by whisper stt.
@capitalcleaning
2 ай бұрын
please focus and zoom the area that you are talking about. that will not be fancy. thank you.
@vickyrajeev9821
2 ай бұрын
Thanks, can I run on CPU because i don't have GPU
@ThorstenMueller
2 ай бұрын
Yes, that's possible. It's just slower than with GPU.
@oleksandr5700
2 ай бұрын
hi, but how can I invoke the cuda usage?
@ThorstenMueller
2 ай бұрын
IMHO CUDA should be automatically be detected/used by whisper? Is it installed in your (venv) environment?