Негізгі бет VQ-VAE | Everything you need to know about it | Explanation and Implementation

Күн бұрын

VQ-VAE | Everything you need to know about it | Explanation and Implementation

Рет қаралды 14,312

ExplainingAI

Жүктеу

Пікірлер: 46

@Explaining-AI
11 ай бұрын
Github Code - github.com/explainingai-code/VQVAE-Pytorch Note: The code at line 65 @11:31 is wrong. It has a typo and it should actually be codebook_loss = torch.mean((quant_out - quant_input.detach())**2) . The repo has the correct version.
@vikramsandu6054
3 ай бұрын
Loved every bit of it. The amount of effort you put in to explain these complex concepts in a simple manner is NEXT LEVEL. This has become my favourite Deep Learning Channel. THANKS A LOT!! keep up the amazing work.
@Explaining-AI
3 ай бұрын
Thank you for the continuous encouragement and appreciation Vikram. It means a lot! Will keep trying my best to put out videos that are worthy of this.
@drannoc9812
4 ай бұрын
Thank you, the visuals really helped me understand, especially for the backpropagation part !
@Explaining-AI
4 ай бұрын
Happy that the video was some help to you :)
@badermuteb1012
9 ай бұрын
How did code the visualizition? Thank you for the tutorial. This is by far the best on KZitem. Keep up please
@Explaining-AI
9 ай бұрын
Thank you! The visualization is not something I came up by myself, I saw it in a different video(link in description) and I thought it would be much better to explain with that kind of visualization. This is roughly how I implemented it . -> Reduce latent dimension as 2 and codebook dimension as 3 -> Bound VQVAE encoder outputs to a range using some activation at final layer of encoder. Say -1 to 1 or 0-1 -> Map both the dimension to some color's intensity value. So maybe x axis is green component(0-1 mapped to 0-255) and y axis is red component and blue is 255 always. Then color each points as (R, G, B) -> (encoded_dimension_1_value*255, encoded_dimension_2_value*255, 255) -> Train VQVAE and get the codebook vectors for trained model and encoder outputs for an image. -> Points on plot are encoder outputs for each cell of encoder output feature map and e1,e2,e3 are the codebook vectors. -> Generate the quantized image using this mapping. I hope this gives some clarity on the implementation part.
@danieltsao4005
3 ай бұрын
The code in line 65 is wrong. It should be codebook_loss = torch.mean((quant_out - quant_input.detach())**2)
@Explaining-AI
3 ай бұрын
Yes indeed. Its correct in the repo - github.com/explainingai-code/VQVAE-Pytorch/blob/main/run_simple_vqvae.py#L65 but the video version has a typo where instead of torch.mean((quant_out - quant_input.detach())**2) its incorrectly implemented as torch.mean((quant_out - quant_input.detach()**2))
@PanicGiraffe
11 ай бұрын
This video fuckin' rips.
@phaZZi6461
9 сағат бұрын
isn't VQ-VAE closer to regular Autoencoders than to Variational Autoencoders? why is it named VQ-"V"AE?
@mehdizahedi2810
Ай бұрын
the best explanation of VQ-VAE, thanks.
@Explaining-AI
Ай бұрын
You are most welcome :)
@IgorAherne
3 ай бұрын
Thank you so much for taking the time to make this beautiful lesson! It is very well made, and made the whole concept clear
@Explaining-AI
3 ай бұрын
Thank you! Really happy that you found the video helpful.
@Omsip123
Ай бұрын
Your videos are so helpful. They are well explained, concise.. I can’t find the word to describe. Unfortunately you do not get the millions of subscribers you deserve, but I hope that it gives you reward to know that your videos are top quality (I have watched over 100h of AI videos) and very helpful for the learning community.
@Explaining-AI
Ай бұрын
Thank you so much for your kind words :) Subs will come when they will come, right now am just happy to ensure my best to create videos that assist people in understanding things a little bit better.
@foppili
4 ай бұрын
Great explanation, covering theory and implementation. Nice visualisations. Thanks!
@Explaining-AI
4 ай бұрын
Thank You!
@yanlu914
2 ай бұрын
Hi, very helpful video! I want to ask what the colors in the quantization output mean. From what I understand, the quantization output has 2 channels (because the codebook embedding dimension is 2). Each pixel in the quantization output corresponds to one of three embeddings in the codebook, so does the color come from the combination of the 2 channels?
@Explaining-AI
2 ай бұрын
Thanks! For visualisations, I bound the quantization output between 0 and 1(using activation at end of encoder). And then for colors, I just fix 2 of the dimensions as red and green and get red and green components for color of the point as encoded_dimension_value*255 . For the blue color I always fix it to be 255. If you are interested in exact details of visualizations, then I have mentioned it here - kzitem.info/news/bejne/kpB-4HWFrqaUoaw&lc=UgyASt6J38hMkqfdd3R4AaABAg.9yyad6YzcP49yyqxBiZkQZ
@yanlu914
2 ай бұрын
@@Explaining-AI Very clean explanation! Thank you!
@amirnasser7768
2 ай бұрын
Thanks so much for the informative video. I always used to ask myself what happened to the KL term 😅. BTW, have you thought about using a Gaussian prior instead of a uniform one? I mean, the prior of the real data is more likely to be Gaussian, so my gut feeling is that using the uniform prior may not be a better choice.
@Explaining-AI
2 ай бұрын
You are most welcome :) The paper uses the uniform prior and that also simplifies the math by getting rid of KL term completely. I havent myself experimented with any other prior, but I am sure we could replace it with another prior(just that we would have to add the KL term depending on this new choice). But having said that, because vqvae has discrete latent space, I am not sure how exactly you would use the gaussian prior and what the kl divergence term would evaluate to, given that q(z) is a one hot vector. If possible, can you elaborate a bit on that ?
@amirnasser7768
2 ай бұрын
@@Explaining-AI I think you are right it is not clear how one will use Gaussians as prior and it is easier to use uniform. Maybe one way is to add an additional KL loss to minimize the distance between each codebook embedding and another embedding initialized with gaussian dist 0 mean and 1 variance.
@amirjodeiry7136
8 ай бұрын
Thank you for providing insightful perspectives on this topic. I appreciate your unique perspective and the effort you've put into providing valuable information, rather than simply copying from the paper. Keep up the great work!
@Explaining-AI
8 ай бұрын
Thank you for saying that!
@leleogere
11 ай бұрын
Very clear explanation! Thanks for the implementation + the vizualisation of the codebook!
@Explaining-AI
11 ай бұрын
Thank you
@linhnhut2134
6 ай бұрын
Thanks a lot for your video Can you explain more detail about the : quant_out = quant_input + (quant_out - quant_input).detach() Why don't just quant_out = quant_out.detach()
@Explaining-AI
6 ай бұрын
Hello @linhnhut2134, what we want is the gradients from quant_out to used as if they are they are also the gradients for quant_input, kind of like copy pasting gradients. So ultimately in forward pass we desire to have quant_out = quant_out, but in backward pass what we want is quant_out = quant_input. And the operation "quant_out = quant_input + (quant_out - quant_input).detach()" allows us to achieve that distinction between forward and backward process.
@scotth.hawley1560
8 ай бұрын
Really nice. Thanks for posting. At 9:52, why are you using nn.Embedding, instead of nn.Parameter(torch,randn((3,2)))? I don't understand where the Embedding comes from.
@Explaining-AI
8 ай бұрын
Thank you! Actually they are both the same. nn.Embedding any way just uses nn.Parameter and normal initialization. github.com/pytorch/pytorch/blob/d947b9d50011ebd75db2e90d86644a19c4fe6234/torch/nn/modules/sparse.py#L143 So nn.Embedding just creates a wrapper in form of a lookup table to store embeddings of a fixed dictionary and size on top of nn.Parameter. Hope it helps.
@eddieberman4942
2 ай бұрын
Really useful for a project Im working on, thanks!
@Explaining-AI
2 ай бұрын
Happy that it was of help to you :)
@inceptor1992
8 ай бұрын
Dude your videos are absolutely amazing! Thank you!!!
@Explaining-AI
8 ай бұрын
Thanks a lot 🙂
@joegriffith1683
4 ай бұрын
Brilliant video, thanks so much!
@Explaining-AI
4 ай бұрын
You're very welcome!
@JiaqiLiu-m2k
10 ай бұрын
Thanks for the explanation, this is very helpful!!!
@Explaining-AI
10 ай бұрын
Thank You! Am glad that it ended up helping in anyway
@PrajwalSingh15
9 ай бұрын
Amazing explanantion with easy to follow animations.
@Explaining-AI
9 ай бұрын
Thank you
@jakula8643
Жыл бұрын
link to code please
@Explaining-AI
Жыл бұрын
Hi @jakula8643, as a result of working on the implementation for the next video, I ended up modifying and making the VQVAE code a bit messy. I will clean it up and have it pushed here github.com/explainingai-code/VQVAE-Pytorch in couple of days time. Apologies for missing this and I will let you know as soon as I do that.
@Explaining-AI
11 ай бұрын
Code is now pushed to the repo mentioned above