Regarding the training part for the VQGAN at 24:24, from what I understand the following is happening: 1. VQGAN grads are zeroed, grads are then propagated over the Discriminator (because of g_loss) and over the VQGAN for the rest of the losses (and the g_loss); retain_graph = True is added in order to keep the previously computed forward pass values, otherwise calling backward again on the same losses would raise an error; 2. Discriminator grads are zeroed to remove what was previously added by the g_loss.backward(), and another backward call is done on the gan_loss to propagate the grads for the proper loss function (d_loss_fake and d_loss_real); 3. The optimizers are called one after another to update the weights with the accumulated values in the leaf-tensors .grad property. One possible error might have occured at step 2 which led to the bad reconstruction seen a few minutes later in the video. The gan_loss.backward() propagates the following: - d_loss_real which was computed by applying the Discriminator over the real images. - d_loss_fake which was computed by using the disc_fake images *generated* by VQGAN. Here is where the issue might lie. The disc_fake_images were obtained by a forward pass through the VQGAN model, as a result the computational graph will retain these forward values and when gan_loss.backward() will be called the d_loss_fake will be propagated over the Discriminator and the VQGAN. In turn, this will adjust the VQGAN's weights to also minimize Discriminator's loss which will be something along the lines of "Generate images such that the Discriminator will be able to easily tell that they are fake". A possible cause for which the VQGAN is still able to reconstruct the images albeit not very well, because of the perturbing loss propagation, might be due to 2 factors: - the reconstruction loss is still present - the discriminator is turned off until the treshold is hit, but after that perturbation comes into place. A solution would be to: - (not as optimum) use two tensors for the fake images: disc_fake_1 = decoded_images and disc_fake_2 = decoded_images.detach() which will not propagate grads through the VQGAN. Pass them both through the Discriminator where disc_fake_1 will be used in g_loss to update the VQGAN and disc_fake_2 will be used in gan_loss to update the Discriminator. - (better as 1 single pass in Discriminator is required) before doing the gan_loss.backward() call, use self.vqgan.requires_grad(False) => this will disable the accumulation of gradients in VQGAN, so only the discriminator will receives values in its .grad property. After the backward() call reactivate the grads self.vqgan.requires_grad(True). I am a beginner in the field so I might be wrong in both my understanding and explanation. Source: - pytorch.org/docs/stable/notes/autograd.html#setting-requires-grad
@csoRoBeRt
9 ай бұрын
I was having the same confusion of the grad propagation the I saw your answer!
@NirDodge
9 ай бұрын
I think that this is not necessary, since opt_disc.step() should only modify the discriminator parameters (and opt_vq.step() should only modify the vqgan parameters).
@csoRoBeRt
9 ай бұрын
But loss_fake.backward() should add grad on the layers of generator. Not sure whether the vq.step() would take two part of grad to update or not @@NirDodge
@NirDodge
9 ай бұрын
@@csoRoBeRt Right, I see... While opt_disc.step() would not affect vqgan weights, it looks like self.vqgan.requires_grad(False) is needed so that gan_loss.backward() will not accumulate gradients on vqgan, that would affect the update in opt_vq.step().
@JJJYmmm
6 ай бұрын
Great! But in solution 1, I think just changing line 56 'disc_fake = self.discriminator(decoded_images)' to 'disc_fake = self.discriminator(decoded_images.detach())' is ok.
@NirDodge
9 ай бұрын
@outliier 26:34 I suspect that the colors are off due to decoded_images.add(1).mul(0.5) in the visualization, which maps the colors from [-1, 1] to [0, 1], but is only applied to the decoded images and not the original images for some reason.
@erwann_millon
2 жыл бұрын
great video
@davita6379
2 жыл бұрын
I should have learned pytorch
@BuddyShanno-x8b
19 күн бұрын
Boehm Alley
@kyliemercer7181
10 күн бұрын
Hernandez Donald Garcia Daniel Harris Jose
@DavidParker-e9n
18 күн бұрын
Rempel Spring
@CarlFriel-c8w
16 күн бұрын
Hills Meadows
@sefadogantr
Жыл бұрын
Hello. First of all, many thanks for the video and source files. I want to develop a midjourney-like system to improve myself and I would like to ask you a few questions for your guidance. With the process I did in the video, we redrawn an existing image. When we make a system like this midjourney, at what stage will it work for us? I have seen projects written with VQGAN and CLIP over colab, but I want to write a system myself. What would you recommend? Which systems do you think I should use? Another question is, I remember doing faster tutorials with tensorflow. Would you suggest using tensorflow instead of pytorch? Thank you.
@outliier
Жыл бұрын
Hey there, first of all I would recommend using pytorch. There is a much greater community out there in the generative field that is using pytorch. Second of all a VQGAN usually represents the first stage to compress data and remove redundancies. You would now need to learn a model which learns in this compressed stage. Thats what the transformer in the second stage is doing. I dont know exactly how midjourney is doing it, but for example stable diffusion uses the same approach of first learning a VQGAN and then learning a diffusion model in the latent space. So usually text-to-image tasks are done using transformers or diffusion models. You can watch my videos on diffusion models and maybe train them and eventually combine them with VQGAN which gives you latent diffusion (the method that stable diffusion is using). Let me know if you have further questions.
@ritacaeslairdese6262
14 күн бұрын
Moore William Lewis John Anderson Susan
@DavyEden-t3x
25 күн бұрын
Moore Kenneth Walker Christopher Taylor Brian
@JamesShrader-k9h
25 күн бұрын
Thomas Kimberly Rodriguez Jason Martinez Sharon
@BobitaBagum-b7q
18 күн бұрын
Robinson Richard Young Helen Gonzalez Shirley
@ronaldbargeman2536
13 күн бұрын
Lee Donald Jackson Margaret Wilson Scott
@hundredthmerch-qk6tb
19 күн бұрын
Johnson Nancy Young Joseph Garcia Margaret
@vinc6966
Ай бұрын
Dude, you are making a YT video, not a class presentation, you have all the time in the world to take your time and explain each module step by step. Especially since your implementation has quite a few bugs… But overall, you did a decent job.
@outliier
Ай бұрын
@@vinc6966 :(
@JeffersonHilary
Ай бұрын
Martin Melissa Young Joseph Young Christopher
@georgechapman7620
24 күн бұрын
Anderson Dorothy Hernandez Kevin Lopez Sharon
@SinthiyaAhamed-g6q
18 күн бұрын
Johnson Anthony Young Anthony Perez Michael
@EdithHazel-g9o
Ай бұрын
Thompson Michelle Miller Ruth Thomas Mark
@csoRoBeRt
9 ай бұрын
marvelous implementation. it's much clearer than looking into the original code
@947973
Жыл бұрын
Excellent Video. Thank you very much for making this video
@MattieGonzalez-tw9xc
25 күн бұрын
Moore Susan Robinson Sharon Anderson Jeffrey
@ronniehamphrey5789
13 күн бұрын
Robinson Mark Martin Deborah Moore Melissa
@RoyMuriel-f1v
23 күн бұрын
Allen Betty Brown Laura Perez Donald
@decreer4567
7 ай бұрын
Hey I looked through your code book. VQVAEs perform a one hot encoding. Is that something from the paper or just something you personally included. Nice video.
@NoddinSummit-r9l
25 күн бұрын
Clark Donna Harris Larry Martin Amy
@SuzanneFleming-nj5cc
16 күн бұрын
Bashirian Turnpike
@BlitheXaviera-f9m
18 күн бұрын
Alexzander Locks
@PhilipPlunk-n9i
18 күн бұрын
Emanuel Underpass
@PatrickBlaise-b3x
14 күн бұрын
Schneider Bridge
@WillPhoebe-w2b
18 күн бұрын
Glover Squares
@AntoneSoliman-b3w
14 күн бұрын
Bailey Point
@CooperLynn-l5x
18 күн бұрын
Odessa River
@FowlerWill-v9g
14 күн бұрын
Kuhlman Mills
@CliftonBrown-t5i
19 күн бұрын
Shanel Station
@ShirelySwanner-t9t
19 күн бұрын
Adams Valleys
@PatriciaLopez-s2y
15 күн бұрын
Abbott Inlet
@DanGlover-b8c
18 күн бұрын
Danyka Locks
@WallisDarren-f4n
18 күн бұрын
Keeling Cape
@EvrimAydın-g8x
18 күн бұрын
Pietro Union
@CissieHugh-m3m
18 күн бұрын
Pfeffer River
@KurtisErnst-b1q
15 күн бұрын
Ned Well
@PhylissFlemm-j6k
19 күн бұрын
Labadie Cape
@tiln8455
2 жыл бұрын
Nice one 👍🏼👍🏼
@AileneTallant-h1l
15 күн бұрын
Gregoria Dam
@Paul-wk7rp
2 жыл бұрын
Cool
@fatihhaslak962
Жыл бұрын
Hello. Thank you for this tutorial. Can you add to VQGAN+CLIP please.
@fatihhaslak962
Жыл бұрын
How to add CLIP for this code
@hassenzaayra5419
Жыл бұрын
Hello, I trained my model w I had a good result I will download an image and used the training model with the extension .pt to see the image reconstituted
@hassenzaayra5419
Жыл бұрын
@Outlier can you help me to test a model vqgan
@miumiu5224
Жыл бұрын
Hi, your video is great! I can't find a second one as good as yours. I have a question I would like to ask, how do I add conditions in the form of pictures when training the second stage transformer
@outliier
Жыл бұрын
Hey thank you! I answered you question on github
@ganeshb8683
9 ай бұрын
Thank you for the great explanation! Out of curiosity - what is the purpose of implementing the blocks (GroupNorm) as a separate class instead of using the predefined class in the torch (torch.nn.GroupNorm) ?
@loko818r
Жыл бұрын
it is a great code, but do you or anybody has the link where to download the flowers dataset?
@outliier
Жыл бұрын
Thanks. Just look for oxford flower dataset and you should find it
@flieskao9161
Жыл бұрын
这老外真牛逼,b站没一个讲的有你一半好的
@outliier
Жыл бұрын
谢谢
@hassenzaayra5419
Жыл бұрын
Hello. thank you so much for the video and source files. can you please add the test code or can you help me to create the test code
@outliier
Жыл бұрын
What kind of test code are you talking about? All the code is on github. Did you see that?
@xiaolongye-y4g
Жыл бұрын
真的很棒 很详细
@神楽坂雫月
Жыл бұрын
awesome!thanks for the video!
@uladzimirtumanau6240
Жыл бұрын
Hi! Do you have a profile on kaggle?
@outliier
Жыл бұрын
No I don’t :c
@helplearncenter198
2 жыл бұрын
thank you
@mkamp
2 жыл бұрын
Very nice. Thanks for the video. QQ: 10:30 Why do you use the expanded version and not just (a-b)**2?
@Kostarion1
2 жыл бұрын
I wonder the same question. I can only suppose that it is needed for not losing much precision when calculating a square of the difference, since (a-b) values can be very small.
@MrXboy3x
2 жыл бұрын
z-flat.shape = [1024,256] , embad.shape = [1024,256] , when you do (a-b)**2 you will get shape [1024,256] which mean ==> for each feature of the 1024 we get nearest 256 code vector, however if we use long term (a**2 +b**2 - 2ab) you will get [1024,1024] ==>for each feature of the 1024 we get nearest 1024 code vector (because of dot product operation). so you will think 256 still going to be better because still going to be nearest features yet its not entirely correct because if we have only 256 as selected features the model when back-propagate will only optimize 256 feature . try it your self : add in __init__ function ==> self.l2 = nn.MSELoss(reduction='none') add in forward function => d=self.l2(z_flattened,self.embedding.weight)
@rikki146
Жыл бұрын
@@MrXboy3x how about (a-b)*(a-b)^T
@rikki146
Жыл бұрын
Never mind I was being stupid. However, there does indeed exist a way to do it more elegantly: embedding = torch.rand((256,512)) # embedding_size, latent_dim z_flattened = torch.randn((10, 512, 16, 16)).view(2560, 512) # N*h*w, channel diff = z_flattened.repeat(256, 1, 1) - embedding[:,None,:] diff_squared = torch.sum(diff**2, dim=2) min_index = diff_squared.argmin(dim=0)
@kalisticmodiani2613
Жыл бұрын
@@MrXboy3x this still does not make sense to me, because mathematically the function that gives f(a,b) the loss from the inputs a and b is the same no matter how you decompose it. So the gradients on the inputs, or the minimum index should be the same... Am I missing something ? I suppose you may make the argument one is more numerically stable than the other, but I heard the (a-b)^2 version is more numerically stable..
@batuhanbayraktar337
2 жыл бұрын
Hello, I did 500 epoch training. But I only want the 500th epoch to generate 5000 images. How can I do?
Пікірлер: 81