Hugging Face LLMs with SageMaker + RAG with Pinecone

In this video, we'll learn how to build Large Language Model (LLM) + Retrieval Augmented Generation (RAG) pipelines using open-source models from Hugging Face deployed on AWS SageMaker. We use the MiniLM sentence transformer to power our semantic search component with Pinecone.
📌 Code:
github.com/pinecone-io/exampl...
📕 Article:
www.pinecone.io/learn/sagemak...
🌲 Subscribe for Latest Articles and Videos:
www.pinecone.io/newsletter-si...
👋🏼 AI Consulting:
aurelio.ai
👾 Discord:
/ discord
Twitter: / jamescalam
LinkedIn: / jamescalam
00:00 Open Source LLMs on AWS SageMaker
00:27 Open Source RAG Pipeline
04:25 Deploying Hugging Face LLM on SageMaker
08:33 LLM Responses with Context
10:39 Why Retrieval Augmented Generation
11:50 Deploying our MiniLM Embedding Model
14:34 Creating the Context Embeddings
19:49 Downloading the SageMaker FAQs Dataset
20:23 Creating the Pinecone Vector Index
24:51 Making Queries in Pinecone
25:58 Implementing Retrieval Augmented Generation
30:00 Deleting our Running Instances
#artificialintelligence #nlp #aws #opensource #chatbot

Жүктеу

Пікірлер: 30

@jamesbriggs
8 ай бұрын
👋🏼 Check out the article version of the video here: www.pinecone.io/learn/sagemaker-rag/
@mr.daniish
8 ай бұрын
James can teach a 9 year old what a RAG is!
@jamesbriggs
8 ай бұрын
I try my best haha
@noneofyourbusiness8625
8 ай бұрын
This channel provides so much valuable information for free and I really appreciate it!
@jamesbriggs
8 ай бұрын
glad to hear :)
@shashwatkumar5556
7 ай бұрын
I want to thank you for this walkthrough. This was very informative. And I know it must have taken quite a lot of time and effort to make it. So thank you!!
@VaibhavPatil-rx7pc
8 ай бұрын
Excellent
@barkingchicken
8 ай бұрын
Great video
@megamehdi89
8 ай бұрын
awesome content, thank you so much. very good explanation. i love watching your videos. i try to follow them and learn 😊
@jamesbriggs
8 ай бұрын
happy to hear that! :)
@e_hossam96
5 ай бұрын
Thank you for your great effort 🤗
@shalabhgarg8225
8 ай бұрын
Well just too good
@Yikina7
2 ай бұрын
Amazing video, thank you very much! It's obvious there was a lot of work involved to make it in such a well structure way. Very easy to follow, you know how to teach :)
@sandeeprawat4981
7 ай бұрын
Thank you so much.. really appreciate...love from India
@RezaA
8 ай бұрын
Thank you for the well described demo. The recommended vector db for this stack is probably opensearch which does the same as pinecone but you have more control and you own it and its more expensive.
@jamesbriggs
8 ай бұрын
meh, opensearch doesn't scale beyond 1M vecs well and their vec search implementation is nothing special - if you want open source I'd recommend qdrant (also rust like Pinecone) or weaviate
@arikupe2
8 ай бұрын
@jamesbriggs Thanks for the video James! I was wondering what issues you've experienced with scaling OpenSearch? We're considering it for our large-scale business use case and had thought it would be a good fit for larger-scale use
@VenkatesanVenkat-fd4hg
8 ай бұрын
Thanks for your valuable videos as always. Can you discuss fine tuning llama 2 7b or 13b using dataset & deploy in sagemaker.....
@user-yu4kt5ie4r
8 ай бұрын
will you be a video on deployment? Great video btw.
@shaonsikder556
8 ай бұрын
This is really great help. Thank you mate for doing this kind of tutorial! Stay blessed
@jamesbriggs
8 ай бұрын
glad it helped!
@energyexecs
2 ай бұрын
James -Great video and I like how you referred by to your flow chart diagram. My task is I am working on the "Corppus" of publicly available engineer technical standards documents that are only available in PDF or Word documents. I want to encode the words (tokens) in those document into a vector database and then take through LLM Bing GPT Transformation Architecture and then using RAG to focus only on the tokens (words) for that "corpus" of engineering standards. Why? This because right now I do a “Control F Search” which takes forever with my clients to find the standards which includes both words and diagrams, pictures (different modality) -- so instead of spending hours on "Control F" my plan is to convert those documents to the vector database and enable a "generative search" in "natural language" instead of "Control F search". Does this make sense? Your video is giving me the pathway to success.
@serkansandkcoglu3048
6 ай бұрын
Thank you! this is very informative! when we put our embeddings into pinecone vector db, is our data going outside? I would be ok to push my sensitive data to aws s3 bucket, but where does that pinecone db resides in?
@sergioquintero4624
6 ай бұрын
@jamesbriggs Hi james, thank you for the amazing video, I have a question.. it's possible to deploy models (embedding and LLM) in the same endpoint ? Just for save monye considering that in the RAG pipelines the embedding step and the retrieval are sequencial steps
@energyexecs
2 ай бұрын
Great video and I like how you referred to your flow chart diagram. I am working on the "Corpus" of publicly available engineer technical standards documents that are only available in PDF or Word documents. I want to encode the words (tokens) in those document into a vector database and then take through LLM Bing GPT Transformation Architecture and then using RAG to focus only on the tokens (words) for that "corpus" of engineering standards. Why? This because right now I do a “Control F Search” which takes forever with my clients to find the standards which includes both words and diagrams, pictures (different modality) -- so instead of spending hours on "Control F" my plan is to convert those documents to the vector database and enable a "generative search" in "natural language" instead of "Control F search". Does this make sense? Your video is giving me the pathway to success.
@riyaz8072
5 ай бұрын
how to create vector vector database for pdf documents ?
@brianrowe1152
8 ай бұрын
Neat but why? Is sagemaker just langchain hosted at Aws?
@jamesbriggs
8 ай бұрын
no it's more like Colab + ML infra, you can also use langchain with sagemaker - the why is for the infra component, hosting open source LLMs is super easy
@pantherg4236
8 ай бұрын
What is the best way to learn deep learning fundamentals via implementation (let's say pick a trivial problem of build a recommendation system for movies) using pytorch in Aug 26, 2023?
@sndrstpnv8419
Ай бұрын
you use in article wrong LLM 'HF_MODEL_ID':'meta-llama/Llama-2-7b' but it suppose to be MiniLM

How to Make RAG Chatbots FAST

What is RAG? (Retrieval Augmented Generation)

顔面水槽がブサイク過ぎるwwwww

Зачем командирам БМ-13 "Катюша" выдавали презервативы? #shorts

Зу-зу Күлпәш. Стоп. (1-бөлім)

ЭКСКЛЮЗИВ: Общагада тұрдық | Анель екінші әйелі ме? Бадықова, шығармашылығы мен жеке өмірі жайлы

Your Own Llama 2 API on AWS SageMaker in 10 min! Complete AWS, Lambda, API Gateway Tutorial

Running Generative AI & LLM on a Kubernetes Cluster | Cloud Institute

Simple Introduction to Large Language Models (LLMs)

Semantic Chunking for RAG

Launch your own LLM (Deploy LLaMA 2 on Amazon SageMaker with Hugging Face Deep Learning Containers)

What is Amazon SageMaker?

LangGraph 101: it's better than LangChain

Retrieval-Augmented Generation chatbot, part 2 - LangChain, Hugging Face, OpenSearch, AWS

Deliver high-performance ML models faster with MLOps tools

China Laptop Mouse New 2024

Windows или Linux: что выбрать?

Что еще за обходная зарядка?

Он Отказался от БЕСПЛАТНОЙ видеокарты

Why spend $10.000 on a flashlight when these are $200🗿

All New Atlas | Boston Dynamics

ЭТО Главный провал Apple перевод @mkbhd Смотри до КОНЦА

Hugging Face LLMs with SageMaker + RAG with Pinecone

Пікірлер: 30