Are you using generated text in any of your work? If so, to follow best practice you need to evaluate its quality.
In this video we will test 2 different metrics: summarization and hallucinations, on examples from 2 different open source datasets that are hosted on hugging face.
-- Watch live at www.twitch.tv/...
Timestamps
00:0 Text Generation evaluation 101
3:13 Deepeval intro
6:00 Make .env work (meh)
18:30 Example usage
20:00 Summarization metric on XSUM dataset
39:50 Hallucination metric on SQUAD2 dataset
Негізгі бет Ойындар Evaluating deepeval framework for LLM output evaluation
Пікірлер