Sentence Transformers (www.sbert.net/) is one of the most popular Language AI/NLP tools. Tens of thousands of users rely on it to build systems for text classification, neural/semantic search, text clustering, and other language AI tasks.
In this conversation, Nils Reimers, the creator of Sentence BERT talks about,
- An introduction to the package and the Large Language Models provided in it
- Lessons learned from the open-source development of such a popular package
- His research collaborations on how to evaluate embeddings through works like MTEB: Massive Text Embedding Benchmark and BEIR
Bio: Nils Reimers is currently the Director and Principal Scientist of Machine Learning at Cohere. Previously, he authored several well-known research papers, including Sentence-BERT and the popular sentence-transformers library. He also worked as a Research Scientist at HuggingFace, (co-)founded several web companies and worked as an AI consultant in the area of investment banking, media, and IoT.
Join the Cohere Discord: / discord
Discussion thread for this episode (feel free to ask questions):
/ discord
===
Contents
Introduction (0:00)
Nils Intro (2:19)
Neural search (2:55)
Dense Bi-encoders (6:26)
Contrastive training (8:16)
Why we need embedding benchmarks (10:07)
The predictive power of benchmarks declines over time (14:28)
Benchmarking Information Retrieval with BEIR (19:58)
Massive text embeddings benchmark (29:07)
SetFit (34:05)
Multilingual search and embeddings (40:52)
Cross-lingual search benefits and drawbacks (46:27)
Lessons from developing open source software (50:18)
The benefits and challenges of maintaining a popular open source library (54:21)
===
Resources:
Bonjour. مرحبا. Guten tag. Hola. Cohere's Multilingual Text Understanding Model is Now Available: txt.cohere.ai/multilingual/
Sentence Transformers: www.sbert.net/
SBERT Paper: arxiv.org/abs/1908.10084
MTEB: Massive Text Embedding Benchmark: arxiv.org/abs/2210.07316
BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models: openreview.net/forum?id=wCu6T...
SetFit - Efficient Few-shot Learning with Sentence Transformers github.com/huggingface/setfit
Негізгі бет Ғылым және технология Sentence Transformers and Embedding Evaluation - Nils Reimers - Talking Language AI Ep#3
Пікірлер: 4