CLIP: Connecting Text and Images

Рет қаралды 24,342

This video explains how CLIP from OpenAI transforms Image Classification into a Text-Image similarity matching task. This is done with Contrastive Training and Zero-Shot Pattern-Exploiting Training. Thanks for watching!
Paper Links:
Clip (Blog Post): openai.com/blog/clip/
VirTex: arxiv.org/pdf/2006.06666.pdf
ConVIRT: arxiv.org/pdf/2010.00747.pdf
Pattern-Exploiting Training: arxiv.org/pdf/2001.07676.pdf
Vision Transformer (Blog Post, Nice Animation): ai.googleblog.com/2020/12/tra...
Thanks for watching! Please Subscribe!

Жүктеу

Пікірлер: 13

@bakr0x
3 жыл бұрын
OpenAI is doing some amazing work, Love it. Also, great analysis man 👍🏽
@Stwinky
3 жыл бұрын
Really impressive results from Open-AI and nice review! I wish my lab had that much compute power :')
@amrahmed2009
3 жыл бұрын
Amazing work from openai and very nice review from Henry :-)
@samirelzein1095
Жыл бұрын
great job!
@kornellewychan
3 жыл бұрын
good shit
@subhanbasha8813
3 жыл бұрын
Hello Henry, Is it recommend to code all the machine learning algorithms from scratch so that I can learn math behind it or just understand and start to code?
@connorshorten6311
3 жыл бұрын
I would recommend just learning the math. I don't think you need to bother with coding it from scratch, but I know there are some great tutorials out there if you are interested in that. However, I think you'll be fine just going right to PyTorch or Keras/TensorFlow if you already have a sense of what's going on. Goodluck with your studies!
@ZettaZone
3 жыл бұрын
It depends on what you want to do next, what to work on. Are you going to create / develop low-level algorithms and their implementations? Should a programmer learn assembly before learning Python? If you are going to program edge devices with limited hardware resources, this is a good idea. But if it's going to be programming high level stuff (eg web development) then definitely not.
@anantakusumap2459
3 жыл бұрын
It's good also cloning github and doing training used for production? Sometimes I've mindset it's better to understand and code from scratch. Example Neural Architecture Search which is hard to implement from scratch. Also create new model like VIT model or transformer. If don't do you have recommended NAS or autoML high level libraries?
@sathvikudupa1668
3 жыл бұрын
Specifically, if you want to be a researcher, coding it out could and being able to train will help you a lot in understanding a research field. Some concepts may look easy enough after a few readings but actually being able to make a network converge and get results similar to the reported ones may be tough (eg: flow models). Also, may a times you may want to use an algorithm, the official code may not be public or could be in a different framework/ older versions, and the unofficial codes are very confusing to modify/ may not work so you'll have to do it on your own(I'm facing these issues in a very popular algorithm)
@lusterdog9694
Жыл бұрын
I'm not knowledgable in this field so some of the technical aspect get away from me, so please correct me. The images are scraped from the internet. Those images come with text attached, known as alt text. That text is used to identify the contents of the image during the initial training of the model. Then Zero shot is just rcognizting patterns and assigning text or categorizing the image. That's probably not entirely right but my main question is if the image that was scraped has inaccurate text attached, for example a picture of a dog but the annotation says, "truck driving down a hill," Will this result in inaccurate training? Or can CLIP identify through zero shot that the image is that of a dog and thus assign it a new text pair based on previous training? The text that comes with the image when it is scrape is the key factor for accurate training. The model doesn't inhenrently know the difference between a dog and truck, it has to learn that through the image text pairs and it's possible to train model to think that a dog is a truck and vice versa.
@imranq9241
2 жыл бұрын
Why call it a zero shot model? Isn't the downstream task basically the same as a test set ?
@muhammadnaufil5237
Жыл бұрын
zero shot, because now it can give do image classification on the labels it is not trained on