GLIP unifies object detection and phrase grounding for pre-training. The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich. In our experiments, we pre-train GLIP on 27M grounding data, including 3M human-annotated and 24M web-crawled image-text pairs. The learned representations demonstrate strong zero-shot and few-shot transferability to various object-level recognition tasks
Github : github.com/microsoft/GLIP
Notebook Link : github.com/karndeepsingh/self...
Connect with me on :
1. LinkedIn: / karndeepsingh
2. Telegram Group: telegram.me/datascienceclubac...
3. Github: www.github.com/karndeepsingh
Негізгі бет Detect Object by Text Prompting using GLIP | Object Detection | GLIP | Karndeep Singh
Пікірлер: 11