Fine-Tuning Multimodal LLMs (LLAVA) for Image Data Parsing

Рет қаралды 1,669

Farzad Roozitalab

Жүктеу

Пікірлер: 12

@AryanPoddar-d3w
Ай бұрын
This is Pure Masterclass!
@airoundtable
Ай бұрын
Thanks! I am glad the video was helpful
@intresting2395
Ай бұрын
Just came after seeing post on LinkedIn as I follow you there - going to try on weekends
@airoundtable
Ай бұрын
I hope you enjoy the content!
@divye.ruhela
3 күн бұрын
Great video! Subbed! Can you direct me to the resources for how one could train llava to add new classes to it? For instance, teach it to recognize and describe traditional battle poses or describe dishes with their traditional names, etc.?
@airoundtable
2 күн бұрын
Thanks. From the technical stand point, what you want to do is very similar with what I did in the video. I also explained how you need to prepare your data for that scenario in the video. There is also a notebook that gives you the hints for data preparation. from there it is just passing the right data to the model and that's it. You have access to everything that you need with this video and the project in my github repository
@raminguyen7940
7 күн бұрын
I am currently working with this model: LLaVA-v1.6 Mistral 7B. I have my own image dataset, but the images are stored in array format. I would appreciate some guidance on how to convert these images into a suitable input for the model. Below is the code I am using: prompt = "What are the things I should be cautious about when I visit this place? What should I bring with me?" max_output_token = 500 prompt = f"[INST] {prompt} [/INST]" inputs = processor(prompt, image, return_tensors="pt").to("cuda:0") output = model.generate(**inputs, max_new_tokens=max_output_token) response = processor.decode(output[0], skip_special_tokens=True) pprint(response)
@airoundtable
6 күн бұрын
I responded to you on LinkedIn
@MuhammadAdnan-tq3fx
Ай бұрын
Thanks for this informative video. I have a question: how can we perform distributed model training on multiple GPUs? In this video, the training is performed on a single 80GB GPU. For example, if we want to perform the training on multiple GPUs (48,48GB) than what should we do?
@airoundtable
28 күн бұрын
The concept is called model sharding where the architecture will be distributed over multiple GPUs. I haven't done it with LLAVA but to understand it, you can have a look at this pytorch blog: pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/ In pytorch the class that does this is called `FullyShardedDataParallel`. You can find more info about it here: pytorch.org/docs/stable/fsdp.html
@PareshPawar-y5w
28 күн бұрын
What do you suggest for that making Python GUI app using tkinkter? or do you prefer other one? do you have any video for it? Thank you in advance!!! Big fan of your teaching!!!
@airoundtable
26 күн бұрын
Thanks! I haven't used thinkter and I don't have any videos for it in the channel

Chat with ALL Your Databases Using AskYourDatabase and LLM agents (A Review)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Big or Small challenge 😂 Giant pretzel or pink gummy ice cream? 🧐 #shorts Best video by Hmelkofm

这娘俩太坏了！合起伙来欺负爸爸 #funny #萌娃 #搞笑#cutebaby

iPhone or Chocolate??

Как подписать? 😂 #shorts

Llama 3.1 Conversational Chat Template for Finetuning using Unsloth & Deployment to Open WebUI

Fine tuning Pixtral - Multi-modal Vision and Text Model

Chat with SQL and Tabular Databases using LLM Agents (DON'T USE RAG!)

Fine-tuning Large Language Models (LLMs) | w/ Full Code

Multimodal RAG: Text, Images, Tables & Audio Pipeline

All-In-One Chatbot: RAG, Generate/analyze image, Web Access, Summarize web/doc, and more...

Fine-Tuning Your Own Llama 3 Model

Chat and RAG with Tabular Databases Using Knowledge Graph and LLM Agents

Llama/Wizard LM Finetuning with Huggingface on RunPod

RAG-GPT: Chat with any documents and summarize long PDF files with Langchain | Gradio App

Big or Small challenge 😂 Giant pretzel or pink gummy ice cream? 🧐 #shorts Best video by Hmelkofm

Fine-Tuning Multimodal LLMs (LLAVA) for Image Data Parsing

Пікірлер: 12