AudioGen: Textually Guided Audio Generation | Text To Audio

❤️ Become The AI Epiphany Patreon ❤️
/ theaiepiphany
👨‍👩‍👧‍👦 Join our Discord community 👨‍👩‍👧‍👦
/ discord
In this video I do a deep dive of the recent "AudioGen: Textually Guided Audio Generation | Paper Explained" paper that introduced text-guided audio synthesis.
In a nutshell, it's the VQ-VAE/GAN idea applied to the audio modality.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
✅ Paper: felixkreuk.github.io/text2aud...
✅ Site: felixkreuk.github.io/text2aud...
✅ 3B1B on Fourier transform: • But what is the Fourie...
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⌚️ Timetable:
00:00 Intro
01:17 Why is text-to-audio hard?
02:51 Comparison with VQ-GAN
05:15 Comparison with SoundStream
06:20 AudioGen overview
09:10 Deep dive: audio representation, LSTM
14:05 Losses explained
17:40 Complex-valued STFTs
21:57 Audio Language Modeling
23:37 Multi-stream audio inputs
25:32 Data and augmentations
29:05 Results
35:28 Outro
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 BECOME A PATREON OF THE AI EPIPHANY ❤️
If these videos, GitHub projects, and blogs help you,
consider helping me out by supporting me on Patreon!
The AI Epiphany - / theaiepiphany
One-time donation - www.paypal.com/paypalme/theai...
Huge thank you to these AI Epiphany patreons:
Eli Mahler
Petar Veličković
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💼 LinkedIn - / aleksagordic
🐦 Twitter - / gordic_aleksa
👨‍👩‍👧‍👦 Discord - / discord
📺 KZitem - / theaiepiphany
📚 Medium - / gordicaleksa
💻 GitHub - github.com/gordicaleksa
📢 AI Newsletter - aiepiphany.substack.com/
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#audiogen #audiosynthesis #multimodal

Жүктеу

DeepMind Perceiver and Perceiver IO | Paper Explained

High Fidelity Neural Audio Compression | Paper & Code Explained

Red❤️＋Green💚＝

Дарю Самокат Скейтеру !

One moment can change your life ✨🔄

WHAT’S THAT?

MusicGen: Simple and Controllable Music Generation Explained

MusicLM Generates Music From Text [Paper Breakdown]

Graph Attention Networks, Multi-Head Attention

Why McDonald’s, Apple and Other U.S. Brands Are Losing in China | WSJ

The Best A.I. Production Tools For Music Makers! (2024)

What My Family Used to Watch in the Soviet Union

Voice Typing Changes Everything - So much more than Dictation!

Why the Nothing CMF Phone 1 is a Game Changer.

NEW AI Synth Replicator: MicroMusic

Quest To Find The Largest Number

Red❤️＋Green💚＝

AudioGen: Textually Guided Audio Generation | Text To Audio | Paper Explained

Пікірлер: 13