Check out my website here! leaderboard.bycloud.ai/
In this video, I will be going through and explain the benchmarks for Chatbot Arena & Open LLM leaderboard. These are more general benchmarks for text-based LLMs, so HumanEval is not here. Let me know any other benchmarks you want me to explain in the future!
[Chatbot Arena] huggingface.co/spaces/lmsys/c...
[Open LLM Leaderboard] huggingface.co/spaces/Hugging...
[MMLU] huggingface.co/datasets/cais/...
[ARC] huggingface.co/datasets/ai2_arc
[Winogrande] huggingface.co/datasets/winog...
[TruthfulQA] huggingface.co/datasets/truth...
[GSM8K] huggingface.co/datasets/gsm8k
[MT-Bench] huggingface.co/datasets/Huggi...
This video is supported by the kind Patrons & KZitem Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Profile & Banner Art] / pygm7
[Video Editor] Silas
0:00 Intro
0:57 MMLU
1:41 ARC
2:10 HELLASWAG
2:57 Winograde
3:27 TruthfulQA
3:52 GSM8K
4:26 MT-Bench
5:05 Outro
Негізгі бет Ғылым және технология 7 Popular LLM Benchmarks Explained [OpenLLM Leaderboard & Chatbot Arena]
Пікірлер: 28