Evaluation Measures for Search and Recommender Systems

In this video you will learn about popular offline metrics (evaluation measures) like Recall@K, Mean Reciprocal Rank (MRR), Mean Average Precision@K (MAP@K), and Normalized Discounted Cumulative Gain (NDCG@K). We will also demonstrate how each of these metrics can be replicated in Python.
Evaluation of information retrieval (IR) systems is critical to making well-informed design decisions. From search to recommendations, evaluation measures are paramount to understanding what does and does not work in retrieval.
Many big tech companies contribute much of their success to well-built IR systems. One of Amazon's earliest iterations of the technology was reportedly driving more than 35% of their sales. Google attributes 70% of KZitem views to their IR recommender systems.
IR systems power some of the greatest companies in the world, and behind every successful IR system is a set of evaluation measures.
🌲 Pinecone article:
www.pinecone.io/learn/offline...
🔗 Code notebooks:
github.com/pinecone-io/exampl...
🤖 70% Discount on the NLP With Transformers in Python course:
bit.ly/3DFvvY5
🎉 Subscribe for Article and Video Updates!
/ subscribe
/ membership
👾 Discord:
/ discord
00:00 Intro
00:51 Offline Metrics
02:38 Dataset and Retrieval 101
06:08 Recall@K
07:57 Recall@K in Python
09:03 Disadvantages of Recall@K
10:21 MRR
13:32 MRR in Python
14:18 MAP@K
18:17 MAP@K in Python
19:27 NDCG@K
29:26 Pros and Cons of NDCG@K
29:48 Final Thoughts

Жүктеу

Пікірлер: 25

@goelnikhils
Жыл бұрын
Amazing Explanation. So clear. Very helpful
@parsakhavarinejad
4 ай бұрын
Clearly explained. Thank you
@sumantjha8392
2 жыл бұрын
Super informative and great..thanks
@aminghaderi1902
4 ай бұрын
Probably best explanation out there.
@sriks4003
20 күн бұрын
Very helpful, thank you!
@shrar837
2 жыл бұрын
Your videos are impressive and very informative mate. 👌
@jamesbriggs
Жыл бұрын
thanks!
@vishalwaghmare3130
Жыл бұрын
Very helpful ❣️
@HazemAzim
8 ай бұрын
Super nice .. Thanks
@anujlahoty8022
5 ай бұрын
What a video, hats off!
@Data_scientist_t3rmi
Жыл бұрын
Good video !
@miguelfsousa
5 ай бұрын
This video is great.
@goelnikhils
Жыл бұрын
Hi James, I have a question on NDCG or any other ranking aware metrics. How does these metrics work where you have millions of products/items. What I mean is if we have millions of items, then it means we have to first label (manually) all the million items for relevance /rank. And then when our model predicts we use NDCG. Isn't this a big drawback of NDCG. Can you please suggest what is better approach to rank if we don't have relevance labeled data. Thanks in
@morannechushtan2101
8 ай бұрын
21:23 Statistically there is probably a cat in the box on image 3
@preetimehta1247
3 ай бұрын
Hi , I have a query If I am working on a song recommendation project by using Spotify API data set, I have used models like cosine similarity, matrix factorization, knn , Latent Semantic Analysis (LSA) model, Correlation Distance method. Now I am confused about how should I approach for evaluation metric in this system.
@Han-ve8uh
Жыл бұрын
1. I got confused at 18:29 when predicted is a nicely increasing sequence making me think are those ranks or item ids. I was also thinking whether the len of intersection act_set & pred_set could simply be len(act_set), then i realized this example here is a very special case where act_set is subset of pred_set. If act_set contains value 9, then we can't use len(act_set) alone and the formula in video is required. 2. Similar to question nikhil goel asked in comments section 2 weeks before this, where does 13:46 actual_relevant data come from? It looks manually labelled, and this labelling occurs per query making it super unscalable?. 3. Assuming we accept manual labelling how is the 0-4 range determined? I feel like drift is a problem, when todays 4 becomes tomorrows' 3 as value judgements change, does this mean relabelling all results again? 4. I noticed some metrics aggregate across queries and k, and some are only within 1 query across k, in what scenarios do we use each? 5. I didn't expect a *relk in AP@K formula, why do we ignore certain precision at certain k? Feels like artificially increasing metrics for the sake of it, which becomes ineffective if every query does it
@tarikkarakas587
Жыл бұрын
Biggest problem is labeling the product whether it is relevant or not. It is not possible to label each search. Meanless if you can't handle with that.
@jamesbriggs
Жыл бұрын
Yeah data prep as usual with ML is the hard part, if you're interested in evaluation methods for IR *without* labeled data look into online metrics for eval (and training)
@joyeetamallik5063
2 жыл бұрын
Hi James! can u make some vedios of updating Models if we Keep on getting data(e.g Biweekly)
@jamesbriggs
2 жыл бұрын
cool idea! I'll add to the list :)
@Data_scientist_t3rmi
Жыл бұрын
IN MRR, when our search result doesnt inclued the result that we want, for your example if we want to search for cats and we find only dogs, how can we calculate MRR ? can we give it a big number for exemple rank 20 for all Not included results? 1/20
@jamesbriggs
Жыл бұрын
yes as you said - or use another metric that better fits to your scenario
@Data_scientist_t3rmi
Жыл бұрын
@@jamesbriggs Thank you for your answer
@mattygrows7667
2 жыл бұрын
love your videos but why do you always seem so sad
@jamesbriggs
2 жыл бұрын
thanks! idk I'm happy I promise lol

Metadata Filtering for Vector Search + Latest Filter Tech

Trends in Recommendation & Personalization at Netflix

ИНДИСКИЙ КИНОЛАР | bayGUYS | 38 шығарылым

Luck Decides My Future Again 🍀🍀🍀 #katebrush #shorts

Can teeth really be exchanged for gifts#joker #shorts

😳Опасно ли пуля выпущенная в Небо ? #shorts

LangGraph 101: it's better than LangChain

Evaluation 13: MAP vs NDCG

Real-Time Search and Recommendation at Scale Using Embeddings and Hopsworks

System Design for Recommendations and Search // Eugene Yan // MLOps Meetup #78

Session 7: RAG Evaluation with RAGAS and How to Improve Retrieval

How Recommender Systems Work (Netflix/Amazon)

Fine-tune High Performance Sentence Transformers (with Multiple Negatives Ranking)

Haystack EU 2023 - Philipp Krenn: Reciprocal Rank Fusion (RRF) - How to Stop Worrying about Boosting

KDD 2020: Hands-onTutorials: Deep Learning for Search and Recommender Systems in Practice-Part 1

The Truth About Learning Python in 2024

How To Unlock Your iphone With Your Voice

Купил этот ваш VR.

Cadiz smart lock official account unlocks the aesthetics of returning home

После ввода кода - протирайте панель

ВЫ ЧЕ СДЕЛАЛИ С iOS 18?

Настоящий детектор , который нужен каждому!

Что это самый маленький айфон в мире 🤯 оцени ролик в коментарии

Evaluation Measures for Search and Recommender Systems

Пікірлер: 25