Негізгі бет Synthetic Gradients Tutorial - How to Speed Up Deep Learning Training

Күн бұрын

Synthetic Gradients Tutorial - How to Speed Up Deep Learning Training

Рет қаралды 12,352

Aurélien Géron

Жүктеу

Пікірлер: 52

@maratkopytjuk3490
6 жыл бұрын
You have a huge talent to exlain things! Thank you for your time and energy. What is the reason that the synthetic models perform better than normal backpropagation? Intuitively it should perform worse on the training dataset because we just use approximations of the gradients.
@Chhillee
6 жыл бұрын
Marat Kopytjuk I suspect that synthetic gradients act as a sort of regularizer. It's kind of been a weird thing for a while that some forms of regularization sometimes speeds up training.
@LinkSF1
6 жыл бұрын
Thanks for the video! Huge fan of your channel and presentation style. I hope you don't mind a small comment regarding one part of your video. You state that truncated BP involves cutting the RNN and performing BP at a fixed point. This isn't fully correct, as truncated BP usually involves BP-ing a fixed number of time steps k back at ANY time point t (i.e. BP for k time steps from time t). Tensorflow, however, performs the style that you refer to (in the interest of computational efficiency). You can find more details in this post: r2rt.com/styles-of-truncated-backpropagation.html
@animeshkarnewar3
6 жыл бұрын
Video on Synthetic Gradients! You did it. Thank you for taking my suggestion. I haven't seen it yet. But I am excited to watch. I am sure it'll be very insightful.
@PhongNguyen-zz1ei
6 жыл бұрын
My god ! Just wake up in the morning and I felt amazing watching your video on bed.
@nateamus3920
6 жыл бұрын
Months and months of work, study, trial and error...condensed into less than 30 minutes of concise instruction. I've found the value of your book to be the same. Once again, Mr. Géron, incredible work!!!
@mouduge
6 жыл бұрын
Nateamus Thanks a lot! :)
@bingeltube
6 жыл бұрын
Just watched your video a 2nd time! Thank you very much for putting this great video and supporting information together, Aurelien! Well done!
@seanpedersen829
6 жыл бұрын
Really impressed by the quality of your videos! Also I am shocked how good your english sounds, assuming your mother tongue is french. Keep doing what you do and may I suggest you provide people with a way to support you financially so you can keep this going.
@AurelienGeron
6 жыл бұрын
Thanks a lot Sean, I'm really glad you enjoy my videos. My mother tongue is indeed French, but I lived in English speaking countries for a total of 12 years (Nigeria, New Zealand, U.S. and Canada). If people want to support my work, the best option is to buy or recommend my book. Thanks again! :)
@Shady9
Жыл бұрын
thank you so much for this thorough and very clear explanation of a complex subject.
@alzeNL
2 жыл бұрын
Thank you for this great video, a great accompaniment to your book which I am working thru.
@Alex-gc2vo
5 жыл бұрын
would it not be better to also proved the labels as input to the synthetic gradient models? it seems like expecting a model to predict deltas without even knowing the final target is just a more complex version of gradient descent with momentum in a way.
@Arecatail
6 жыл бұрын
Thanks for posting these videos. They are incredibly insightful. The book is great too.
@lgsoftwares7093
6 жыл бұрын
Thanks for the best way explaining. When is the new book comming ?
@ronnywing9049
6 жыл бұрын
Absolutely incredible work here, sir. Thank you so much for your efforts
@greendatadialog
6 жыл бұрын
Great job man! I’m sharing it with the Data science community here in Hong Kong!
@VijayKumar-fv6dx
6 жыл бұрын
Thanks for your video ... really a great explanation .. not much needed beyond that... also about the Annie's painting is superb... Happy New Year to you as well
@RelatedGiraffe
6 жыл бұрын
Great video! Very well explained. Synthetic gradients become really useful when you can't afford to store all activations in memory (which is necessary for regular backpropagation) such as for really long time chains in recurrent neural networks. But what is the benefits of using synthetic inputs? I see that they are described in the original paper, "Decoupled Neural Interfaces using Synthetic Gradients", but I don't see that they mention any reason for using them? Or do you think they explored the possibility to use them more out of curiosity to see whether it is possible at all?
@jamespack161
6 жыл бұрын
Aurélien, thank you for putting together this video. This talk is the best and most approachable explanation of synthetic gradients I have seen or read. Nice job!
@PaulHobbs23
6 жыл бұрын
Thanks for making this video, this is a very interesting result! Does the paper explain why you would want to use synthetic inputs in addition to cDNI? It seems like cDNI already gives you the ability to train a model in a fully parallelized way.
@Vladeeer
6 жыл бұрын
what a great way to end 2017
@kozzuli
6 жыл бұрын
Best explanation seen so far. Thanks a lot! Looking forward for your next video.
@bingeltube
6 жыл бұрын
Very recommendable
@Skythedragon
6 жыл бұрын
Great explanation, You just got a new subscriber!
@fanyixiao7235
6 жыл бұрын
Very well explained!! Thank you for your efforts :)
@BadriNathJK
6 жыл бұрын
You have a great voice. Keep up the channel. Bring more videos
@rantaoca491
6 жыл бұрын
Very well explained! Please do more videos like this :)
@ahmedadly
6 жыл бұрын
Wonderful explanation, the best so far for caps net
@thangbom4742
5 жыл бұрын
brilliant idea. is it implemented in some platform?
@bobsalita3417
6 жыл бұрын
Excellent clarity of thought.
@aa-xn5hc
6 жыл бұрын
Thank you ! this is fantastic content.
@mohamadyakteen8710
6 жыл бұрын
Excellent video, I'm glad I've reached to your channel, I've just subscribed. Is there a PDF version of your book that we can buy online?
@AurelienGeron
6 жыл бұрын
Thanks Mohamad! There's a PDF version available on ebooks.com. Here's the link: goo.gl/d9ZV3t
@mohamadyakteen8710
6 жыл бұрын
Thank you Aurélien, I've got the book. Can't wait to reach Chapter 12 because I started learning CUDA libraries few months ago.. Judging from an overall perspective, the book is highly recommended , Great job.
@abhiwins123
6 жыл бұрын
Your book is as amazing as your explanation 👍
@AurelienGeron
6 жыл бұрын
Thanks Abhijith, I'm very glad you like both! :)
@dherbemontvictor5188
6 жыл бұрын
Hello Aurelien ! Thank you for this video and all the pasts (and i hope future !) vids ! I surely miss something because I don't understand why this is faster and how you can distribute this calcul. I understand that you create an estimator of the gradient that you use for the updating of the parameters, but as to compute the estimator of the gradient for the layer i you have to use the estimator of the layer i+1 ( to calculate the distance between the two) , somehow you have to wait that every layer j, i < j have computed their estimator of the gradient to compute yours (as every layer wait for the estimator of the layer after him to compute his estimator of the gradient) . I am missing something I think, so i hope you can help me on that ! Merci beaucoup ! Victor
@Chhillee
6 жыл бұрын
D'herbemont Victor so, my understanding is that it isn't strictly faster, but it's parallelizable. Just like how there still is a forward lock even with dni's, you still have to wait for the next layer to compute its gradients before the synthetic gradient layer can update. Remember that the synthetic layer's outputs are an approximation of what would be the gradient at layer n+1, and that's what you need to compute the gradient for layer n
@SuperBlablou
6 жыл бұрын
No, you estimate using only the output of the ith layer and eventually the true label of the sample. You will, as you say, need feedback from the i+1th layer later to compute the true loss and compare it with the estimated loss to improve your estimator. The point is that you didn't have to wait until the full forward pass is over to update the layer i. So you can directly start to compute the next sample's output in layer i.
@dherbemontvictor5188
6 жыл бұрын
Thank you Aloïs for your answer ! So In a certain way the calculation of the estimator of the gradient and the forward pass, are not made synchronously ? When I say synchronously, I mean that you can make in one hand the calculation of the forward pass and updating the weight with your estimator and in an other hand you calculate the loss and the updating of the different gradient for each layer or group of layer ?
@SuperBlablou
6 жыл бұрын
Yes, you got it :)
@dherbemontvictor5188
6 жыл бұрын
Excellent! thank you for your time Aloïs!
@alibaheri4614
6 жыл бұрын
Thanks. Is there any way to access the slides presented in this video?
@AurelienGeron
6 жыл бұрын
Sure, here are the slides: www.slideshare.net/aureliengeron/synthetic-gradients-tutorial
@alibaheri4614
6 жыл бұрын
Thanks, great.
@taksirhasan3551
6 жыл бұрын
May I know which tools were used for the figures?
@mouduge
6 жыл бұрын
Taksir Hasan I just use Google Slides (same as for my book). I wish they had a shortcut to toggle Help > Snap to Guides, but apart from that it's pretty easy to use.
@taksirhasan3551
6 жыл бұрын
Thanks :)