Abigail Haddad - Automating Tests for your RAG Chatbot or Other Generative Tool

Automating Tests for your RAG Chatbot or Other Generative Tool by Abigail Haddad
Visit rstats.ai for information on upcoming conferences.
Abstract: Building a Retrieval Augmented Generation (RAG) chatbot that answers questions about a specific set of documents is straightforward. But how do you tell if it's working? Automated evaluation of generative tools for specific use cases is tricky, but it's also important if you want to easily compare performance using different underlying LLMs, system prompts, temperatures, or other parameters -- or just make sure you're not breaking something when you push your code. In this talk, I'll discuss why this kind of evaluation is challenging and review a few options for the kinds of assessments you can create, including using an LLM to evaluate your LLM-based tool. We'll then look at several ways to write automated LLM-led evaluations, including with a library that allows you to easily and with very little coding create complex grading rubrics for your tests.
Bio: Abigail Haddad is a data scientist who is working on automating LLM evaluations. Previously, she worked on research and data science for the Department of Defense, including at the RAND Corporation and as a Department of the Army civilian. Her hobbies include analyzing federal job listings and co-organizing Data Science DC. She blogs at The Present of Coding.
Twitter: / abbystat
Presented at the 2024 New York R Conference (May 16, 2024)
Hosted by Lander Analytics (landeranalytic...)

Жүктеу

Wes McKinney - The Future Roadmap for the Composable Data Stack

Hadley Wickham - R in Production

Pencukuran bulu kiwi terlalu berlebihan! Tidak ada kulit, Bukan masalah! Siap dimakan! 😱🥝

Тұсаукесер! Зу-зу Күлпаш 2. Қорыққанға қос көрінер.

Это не те шарики 😳 @TNT_shows #тнт #shorts #концерты #юмор

The Joker wanted to stand at the front, but unexpectedly was beaten up by Officer Rabbit

Automated Evaluation for RAG Chatbot or Other Generative Tool | Abigail Haddad | Conf42 LLMs 2024

What is generative AI and how does it work? - The Turing Lectures with Mirella Lapata

Election Stress and The Workplace: Strategies for HR Leaders

Robert Greene: A Process for Finding & Achieving Your Unique Purpose

Anna Kircher - Analyzing Consistency in LLM Outputs Leveraging Colourful Queries

Think Fast, Talk Smart: Communication Techniques

Max Kuhn -SHINYLIVE IS SO EASY

Jon Harmon - I Built a Robot to Write This Talk

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Sean Taylor - Analyzing and Visualizing Event Sequence Data

Pencukuran bulu kiwi terlalu berlebihan! Tidak ada kulit, Bukan masalah! Siap dimakan! 😱🥝

Abigail Haddad - Automating Tests for your RAG Chatbot or Other Generative Tool

Пікірлер