OCR Using Microsoft's Florence-2 Vision Model on Free Google Colab

In this video, I demonstrate how to implement Microsoft's recently released Florence-2 novel Foundational Vision Model on a free Google Colab workspace using a T4 GPU. I use Optical Character Recognition (OCR) as the primary use case to showcase the model's capabilities.
You'll learn:
1. An introduction to the Florence-2 Vision Model
2. Loading and configuring the Florence-2
3. Implementing OCR task with this advanced model
4. Evaluating the performance and results of OCR using Florence-2 Vision Model.
Code Link - colab.research.google.com/dri...
Florence-2 Model - huggingface.co/microsoft/Flor...
#florence2 #vision #multimodal #multimodalai #llm #microsoftai #googlecolab #ocr #machinelearning #ai #tutorial #freeresources #attention #objectdetection #segmentation

Жүктеу

Пікірлер: 13

@jinanlionbridge4521
12 күн бұрын
Thanks for sharing! very useful
@Steven_249
7 күн бұрын
wow... you are super smart..... especially when you change the code for OCR REGION....! Amazing !!!
@theailearner1857
7 күн бұрын
Glad it helped!
@despo13
19 күн бұрын
Thanks
@sudabadri7051
9 күн бұрын
Good video
@seanthibert5961
9 күн бұрын
Any luck with making use of the raw OCR results? I find it picks up more than the ocr_with_region
@trinityblood5622
9 күн бұрын
Any luck on Finetuning the OCR part with custom dataset other than English?
@theailearner1857
8 күн бұрын
Haven't tried yet, but will try to make a video on finetuning.
@ai_enthusiastic_
17 күн бұрын
How much RAM does it need to run on a CPU?
@theailearner1857
17 күн бұрын
In full precision, it would need approximately 10-11 GB of RAM for inference. If you are not able run it on CPU, you can try with quantized model.
@NimeshV-nf6uz
18 күн бұрын
Can I run this on cpu ?
@theailearner1857
18 күн бұрын
Yes you can. Change the "device_map" argument to "cpu". And also make sure to not move input tensors to "cuda".
@NimeshV-nf6uz
17 күн бұрын
@@theailearner1857 thanks 🤜🤛

The moment we stopped understanding AI [AlexNet]

Future Proof Your Tech Career In the Age of AI

Became invisible for one day! #funny #wednesday #memes

Does size matter? BEACH EDITION

- А что в креме? - Это кАкАооо! #КондитерДети

It’s Not Hard to be a Hero! 🌟🦸‍♂️ you can do good things

Optical Character Recognition (OCR)

Graph RAG: Improving RAG with Knowledge Graphs

OCR Using Microsoft's Phi-3 Vision Model on Free Google Colab

Florence-2 : Advancing a Unified Representation for a Variety of Vision Tasks | Paper Explained

Florence 2 Fine-Tuning: How to Train a Vision Language Model?

Phi-3 Medium - Microsoft's Open-Source Model is Ready For Action!

Can AI code Flappy Bird? Watch ChatGPT try

How I’d learn ML in 2024 (if I could start over)

GraphRAG: LLM-Derived Knowledge Graphs for RAG

Llama Agents as Micro Services!!!

Полная версия на @brother-live Запустил серверный комп который нашёл на радиоэлектронной свалке))

Я УКРАЛ ТЕЛЕФОН В МИЛАНЕ

1$ vs 500$ ВИРТУАЛЬНАЯ РЕАЛЬНОСТЬ !

Как правильно выключать звук на телефоне?

Как слушать музыку с помощью чека?

Klavye İle Trafik Işığını Yönetmek #shorts

Смотрим все цвета Galaxy Z Fold/Flip 6, Watch Ultra, Buds3 Pro и немгого AI...

OCR Using Microsoft's Florence-2 Vision Model on Free Google Colab

Пікірлер: 13