Негізгі бет Ғылым және технология LLM benchmarks

Күн бұрын

LLM benchmarks

Рет қаралды 480

How are LLMs evaluated?
00:00 - Introduction and motivation for looking at LLM benchmarks
00:38 HumanEval benchmark for code synthesis
02:27 - Exploring the HumanEval dataset
03:24 - MMLU (Massive Multitask Language Understanding) benchmark
04:37 - Exploring the MMLU dataset
05:58 - BigBench meta-benchmark with 200+ tasks
06:50 - Exploring a logical reasoning task in BigBench
08:13 - BigBench Hard subset of challenging tasks for LLMs
08:46 - Example tasks from BigBench Hard
10:21 - Wrap up and other notable benchmarks not covered
github.com/openai/human-eval
github.com/google/BIG-bench
github.com/suzgunmirac/BIG-Be...
github.com/hendrycks/test (MMLU)
vivekhaldar.com
x.com/vivekhaldar

Жүктеу

Пікірлер: 1

@sasha297603ha
Ай бұрын
Great video, thanks for covering!

Fixing RAG with GraphRAG

LLMs eat entry-level SWEs

Osman Kalyoncu Sonu Üzücü Saddest Videos Dream Engine 118 #shorts

Normal vs Smokers !! 😱😱😱

Stupid man 👨😂

ТЫ СМОЖЕШЬ УГАДАТЬ ЦВЕТ?! (У 1% ПОЛУЧИТСЯ) #Shorts #Глент

I Analyzed My Finance With Local LLMs

Let's build GPT: from scratch, in code, spelled out.

Search with LLMs and vector embeddings

Learn about the HumanEval LLM benchmark with Empirical

The Future of RAG in the Age of Large Context Windows

7 Popular LLM Benchmarks Explained [OpenLLM Leaderboard & Chatbot Arena]

Read Giant Datasets Fast - 3 Tips For Better Data Science Skills

LLMs can "breed" their own prompts

Testing AI Models with Bench LLM - See Which One's Best!

Fine-tuning or RAG?

Good Tool Cutting And Recycling Circuit Board Easily- Wisdom Tips Machine Easy Easyway Easywork !

Edit My Photo change back coloured with Bast Tech

НЕЛЕПЫЙ ФЕЙЛ при замене гнезда на Usb-c в Xiaomi Redmi AirDots #wireless #mi #redmi

How Neuralink Works 🧠

Переходник для IPhone • 181649538 Делюсь обзорами в профиле @lykofandrei

Первый ВОЗВРАТ на OZON! Почему КЛИЕНТ вернул нам ПК за 95.000?! 🤬

⌨️ Сколько всего у меня клавиатур? #обзор

LLM benchmarks

Пікірлер: 1