Thank you very much for this series, and the overall amazing content, genuinely appreciated !
@Explaining-AI
6 ай бұрын
Thank you so much for this comment :)
@yuuno__
6 ай бұрын
will you cover MAMBA implementation later? I think there's no current video with clear explanation. It would be very nice if you do it.
@Explaining-AI
6 ай бұрын
Hello, I indeed plan to cover it but it wont be part of this series. I have 3-4 topics that I intend to cover first and then after that will do a video on Mamba.
@goneshivachandhra7470
3 ай бұрын
Is detr covered in this series
@Explaining-AI
3 ай бұрын
yes it would cover DETR as well. After FasterRCNN, I plan to do Yolo/SSD/FPN and then I will get into DETR.
@rrrfaa
Ай бұрын
Will you also do a video on EfficientDet?
@khadimhussain6155
6 ай бұрын
can you also explain pytorch code for RCNN
@Explaining-AI
6 ай бұрын
Hello, I will soon be doing a video on implementation of faster rcnn, in which I will cover the PyTorch code as well.
@KetanBansode-n8w
Ай бұрын
Can you please do a video on yolo object detection and do code implementation from scratch
@Explaining-AI
Ай бұрын
Hello, Yes, thats the video that I am working on right now. Will first do a Yolov1 explanation and implementation video and then will later follow up with other yolo versions.
@ArpitAnand-yd7tr
6 ай бұрын
Great video as always. Appreciate the way you logically break down the reasons for architectural choices and smoothly transition to successive steps Eagerly waiting for the next video in the series! Just wondering if you intend to cover MobileNetV2 and EfficientNetV2 in this series
@Explaining-AI
6 ай бұрын
Thank you so much for that! Actually those two wont be covered in this. I plan to do a separate one on popular backbone architectures like vgg/inception/resnet/mobilenet/efficientnet/darknet/swin e.t.c so I will cover them in that series.
@bugbountyhunter9203
2 ай бұрын
Great video, but the background music is a bit distracting, imo
@Explaining-AI
2 ай бұрын
Thank you for this feedback. Will take care of this in future videos of this series.
@cryes9774
3 ай бұрын
i think your object detection series is awesome but you should not put background sound :D
@Explaining-AI
3 ай бұрын
Thank you for this feedback. I assume the background music becomes a distraction. Is that right ? Do you think reducing the background sound would work fine or you would prefer not having it altogether .
@asutoshrath3648
2 ай бұрын
@@Explaining-AIbackground is fine i guess
@Explaining-AI
2 ай бұрын
@@asutoshrath3648 Thank you for this input
@sauravns1224
Ай бұрын
Gotta tell you man, amazing content and presentation. Also to add the background music is very soothening 🍃. Waiting for the YOLO series
@Explaining-AI
Ай бұрын
Thank you! Working on YOLO video only as of now. Agree on the background music, I too find it calming.
@wolfpack7330
5 ай бұрын
Very well done
@Explaining-AI
5 ай бұрын
Thank You!
@anshumansinha5874
2 ай бұрын
Hi @14:39 , you said if our image have 2 classes then the network would have 3 outputs. But how would you know that all the images have only 2 and these 2 classes only? Or is this network only made for a specific set of images which only have cars and persons as 2 distinct objects?
@anshumansinha5874
2 ай бұрын
Is the selective search mechanism fine-tuned for a specific set of images? (Like : 1. (Person, Car) , 2.(Bird, Buildings, Lights) etc. But would that not need a different network for a different set?
@Explaining-AI
2 ай бұрын
Hello @@anshumansinha5874, the number of categories are predefined and the network is only trained for detecting these predefined categories. So the hypothetical example that I was mentioning, refers to some dataset that has annotations only for person and car and post training you will end up with a network which given an image can detect car or person(only these two objects) in it. This model will ignore any other categories say buildings/bird in the image and will basically predict regions having such objects as background. Regarding the selective search question, it is neither trained or fine tuned. Its a proposal generation algorithm that latches on hints like presence of edge, change of texture e.t.c to divide the image into different regions and bound those regions within bounding boxes to give us region proposals. So it does not really depend on your dataset or the kind of categories you have in your image.
@anshumansinha5874
2 ай бұрын
@Explaining-AI Perfect, thanks a lot for the answers, I had one follow up question. From my understanding, we train K binary SVMs after we have fine-tuned the CNN backbone with the multi-class classification objective. I'm a bit confused on what the SVM will pass as +ve? will it only give +ve label for a perfect ground truth input image (input image = a ground truth bounding box adjusted to 227x227 input dimmension) i.e an IOU = 1.0? What happens to the instances which lie between Iou of 1.0 and 0.3? What does the SVM classify them into? Lastly, if the SVM only gives +ve to the input image with iou = 1.0 ; should it not be better to correct the images for localisation error as soon as we get the region proposals? i.e having a trained bounding box regressor (as it's already being trained separately) and then passing on the corrected image to the CNN+SVM model for training/ predictions? I'm a bit confused because @26:12 you've mentioned if the selective search performs bad and doesn't give any proposal with iou = 1.0, then our predicted region will be this itself. However since the SVM only gives +ve result for iou = 1.0 this should not be the case.
@Explaining-AI
2 ай бұрын
@@anshumansinha5874 what the SVM will pass as +ve -> During training SVM is going to get the following data points for each class(lets say car). Positive - ALL Ground truth boxes for car class Negative - Selective search region proposals < 0.3 IOU with ground truth boxes that belong to car class Rest all are ignored Then SVM in the 4096 feature dimensional space learns a boundary that separate these positive and negative labelled data points. So during inference even regions that do not exactly capture the object(IOU = 1) but capture a large enough part of it(IOU = 0.8), such regions will still be predicted to be on the positive side of the decision boundary. should it not be better to correct the images for localisation error as soon as we get the region proposals -> There are two parts to this. First is that SVM is going to give a score, so during inference, even if a region proposal is not perfect box containing car(but contains a large enough part of it), it will still have a return a higher score for 'car' than for background. The second part regarding modifying regions prior to feeding it to SVM. Its theoretically correct but rather than trying to first modify the proposals(because then you would have to be feed ALL 2000 proposals to bbox regression layers), the authors instead get svm score, get newly predicted box and then try to rescore again(feed again to SVM) using the newly predicted box. However, that doesnt lead to any benefits. From paper "In principle, we could iterate this procedure (i.e., re-score the newly predicted bounding box, and then predict a new bounding box from it, and so on). However, we found that iterating does not improve results."
@anshumansinha5874
2 ай бұрын
@@Explaining-AI 1. SVM: Oh, okay. I get it, I think you're talking about the SVM margin which can help the model include some samples with a slight less IOU as well. Do you think this one of the advantages of using a margin based method like SVM? (Honestly, I'm not able to recollect any other method with a max-margin/ hinge loss). I mean they could've used any other binary classifier as well. 2. Makes sense after I got the margin concept of SVM, thanks for the help. And great videos.
Пікірлер: 30