Негізгі бет How to build Multimodal Retrieval-Augmented Generation (RAG) with Gemini

Күн бұрын

How to build Multimodal Retrieval-Augmented Generation (RAG) with Gemini

Рет қаралды 60,013

The saying ""a picture is worth a thousand words"" encapsulates the immense potential of visual data. But most retrieval-augmented generation (RAG) applications rely only on text. This session applies RAG to multimodal use cases. It focuses on embeddings and attributed question answering to retrieve data. We’ll begin with a high-level architecture and quickly dive into a practical demo. Attendees will learn to create powerful LLM-based workflows and embed them in existing applications.
Speakers: Shilpa Kancharla, Jeff Nelson
Resources:
Try Gemini in Vertex AI → goo.gle/3Vttolh
Watch more:
Check out all the AI videos at Google I/O 2024 → goo.gle/io24-a...
Check out all the Cloud videos at Google I/O 2024 → goo.gle/io24-c...
Subscribe to Google Developers → goo.gle/develo...
#GoogleIO
Event: Google I/O 2024

Жүктеу

Пікірлер: 53

@GoogleDevelopers
4 ай бұрын
Check out all the AI videos at Google I/O 2024 → goo.gle/io24-ai-yt
@gangababu2063
3 ай бұрын
IO2024_Multimodal_RAG_Demo.ipynb can't find this notebook
@PeterLappo
3 ай бұрын
Pretty useless video without sample code.
@jprak123asd
3 ай бұрын
I wanted to extend my heartfelt thanks for the excellent session on how Retrieval-Augmented Generation (RAG) can be used to train Large Language Models (LLMs) to build expert systems in the retail, software, automotive, and other sectors. Your explanation was incredibly clear and insightful, making a complex topic easily understandable. I truly felt like Dr. Watson listening to Sherlock Holmes unravel the mysteries of the universe, marveling at the clarity and depth of the information presented. Your efforts in breaking down the concepts and applications of RAG in such a straightforward manner have left me feeling both enlightened and excited about the potential this technology holds for our industry. Thank you once again for your time and for sharing your expertise. I look forward to exploring and implementing these innovative solutions in our own projects
@thyagarajesh184
2 ай бұрын
Impressive technology. Look forward to using it for my project.
@sarvariabhinav
Ай бұрын
WHERE IS THE SAMPLE CODE??????? This is very frustrating to showcase but not share code.
@diegomoralessepulved
10 күн бұрын
@googledevelopers I second this comment.. could you please share that notebook?
@charlesbabbage6786
4 ай бұрын
Could'nt find the exact notebook used here.
@dumbol8126
4 ай бұрын
will there be an opensource version of this, or atleast a paper
@TL735
17 сағат бұрын
Nice, but why don't you develop a simple drag-and-drop RAG? e.g. I add a drive folder link and Google generates a RAG chat based on its content.
@zuowang5185
2 ай бұрын
How do you handle terabytes of enterprise data, just do embedding groups? Should you generate sub questions first? How do you handle large amount of users?
@nestorbao2108
8 күн бұрын
Why do you use multimodal embedding model if you summarize images and ground them into text?
@mariaescobar8003
4 ай бұрын
When I use RAG, Am I sharing my data with the model/company? or is it private with an extracost?
@vichupayyan
4 ай бұрын
Rag is an architecture i believe. with out without it - whatever happening to the data same applies
@hitmusicworldwide
3 ай бұрын
Not necessarily. You can keep the data local. You only use the LLM for it's ability to summarize and generate responses as well as queries
@hasszhao
4 ай бұрын
where is this notebook in the cookbook repo?
@d.d.z.
4 ай бұрын
Same question
@shubhamsharma5631
4 ай бұрын
33:18
@Chitragar
4 ай бұрын
I have a notebook in Kaggle named Multimodal RAG Gemini - should help, YT removing links for some reason.
@d.d.z.
4 ай бұрын
@@Chitragar thank you
@cullenharris1837
3 ай бұрын
@@shubhamsharma5631 I challenge you to find it. That is simply a link to the general github which is convoluted , not the exact notebook which is difficult to find.
@IndianLeopard7
3 ай бұрын
Wat about Copyright and Ethical issues? How much do u guys charge for using ur model? And as per IBM and Oracle embeddings are nothing new so why use urs?
@nagpalvikas
3 ай бұрын
Is "unstructured" the best choice here for parsing PDF? Any better alternatives?
@ai_asymmetric
3 ай бұрын
Llamaparse
@You_Only_LiveOnce
3 ай бұрын
langchain would be a good choice
@nagarathnabheggade8410
3 ай бұрын
This example briefs about text and PDF, do we have any for video how de we use RAG, Vector store for Video can anyone give some reference
@descarded
2 ай бұрын
im not sure if there are existing libraries to do that, maybe check docs. although here's my intuitive approach. video is basically series of images with some history/context attached to previous and subsequent frames. so if you keep that history across frames intact by either providing previous frames as input, or keep a local vector of it all, you can make it work. not sure if its the best approach, but i m open for discussion
@ammarfasih3866
2 ай бұрын
where is the notebook? Can someone please share the link?
@dr.p.srinivasaragavanperum2911
2 ай бұрын
Happy
@ai_asymmetric
3 ай бұрын
dense embeddings are never enough for RAG system
@oldmansgoldenwords
4 ай бұрын
You can get blue driver and get all error codes and example
@SonuChaudhary
22 күн бұрын
Where is the code link?
@SB-md2km
3 ай бұрын
Ok but someone could literally look any of this up online or look for it in a manual, etc. w/out using AI...
@dr.p.srinivasaragavanperum2911
2 ай бұрын
❤
@user-xx3mr6vx9u
2 ай бұрын
Haha we just need your browsing history
@Inceptionxg
4 ай бұрын
After Muaadh Rilwan's post on LinkedIn
@dr.p.srinivasaragavanperum2911
2 ай бұрын
🎉
@adithiyag4616
4 ай бұрын
Please share the colab link
@gnanasenthil654321
4 ай бұрын
Yes please do
@shubhamsharma5631
4 ай бұрын
33:18
@pratikpratik8495
4 ай бұрын
github link please
@shubhamsharma5631
4 ай бұрын
33:18
@kaushikdas5115
3 ай бұрын
@@shubhamsharma5631 can we run the code without subscription?
@fast-path
4 ай бұрын
🥺
@JH-bb8in
4 ай бұрын
This shows how garbage Langchain is as a library. Extremely verbose and intransparent.
@imai-pg3cz
4 ай бұрын
Is there any framework better than Langchain?
@gokusaiyan1128
3 ай бұрын
can you tell me more about it please :)
@ohmatokita5990
2 ай бұрын
So if I'm using the 2nd way, what's the name of the multidality-modal would be?