Turn Videos Into Blog Posts With AI! - GPT-4, Whisper-1, and Embedding Model Approach

A Challenger Approaches! Andrej Karpathy’s recent GPT Tokenizer video came with a fun request, create a companion guide/blog post of the already made video automatically.
In this video, we go over my approach to this, and how I used LLMs like GPT-4-Turbo, Audio Models like Whisper-1, and Embedding Models from OpenAI with vector databases to fully automate this process.
Github Page for Code & MD File: github.com/ALucek/companion-g...
@AndrejKarpathy Tokenizer Video: • Let's build the GPT To...
Andrej’s Tweet: / 1760740503614836917
Semantic Chunking: python.langchain.com/docs/mod...
Chapters:
00:00 - Intro & Context
01:46 - My Solution!
03:48 - Process Overview
04:43 - Downloading & Chunking Audio
07:23 - Transcribing with Whisper-1 & Post Processing
09:57 - Semantic Chunking Transcript
12:08 - LLM Prompting & Setup
16:04 - Initial LLM Output Overview
16:45 - Embedding Transcript for Similarity Search
17:46 - Searching for & Inserting Links + Pictures
20:12 - Main Script Overview
21:15 - Cost, Time, & Token Consumption
21:49 - Revisiting the Markdown Document
22:54 - Limitations & Drawbacks of My Approach
26:00 - Outro

Жүктеу

Пікірлер: 10

@tvinay8758
2 ай бұрын
This is very well done , would you plan to do a video combining the BPE algorithm and train transformers by Andre into a single training algorithm and demo in a video like this .
@TheHistoryCode125
2 ай бұрын
This video showcases an impressive attempt at automatically generating a companion guide for Andre Karpathy's lengthy video on tokenization. The process involved several steps: downloading the video and audio, chunking the audio for transcription with timestamps using Whisper, further chunking the transcript semantically, generating markdown summaries with placeholders for images and hyperlinks using GPT-4, and finally, employing similarity search to replace those placeholders with relevant media from the video. While the final output is commendable, the video glosses over some limitations. The reliance on sequential processing of text chunks leads to inconsistencies and potential loss of context. Additionally, the accuracy of the similarity search for linking relevant media could be improved. Despite these shortcomings, the video demonstrates a promising approach to automatically generating companion guides for lengthy videos, offering a more skimmable, searchable, and linkable format for viewers.
@IceMetalPunk
2 ай бұрын
Regarding the issue with the audio chunk files being bigger than you expected: remember, encodings are not a static 1-to-1 with the length of the audio (except possibly for WAV files). Your chunk_duration calculation assumes every interval of, say, 5 seconds will be saved in the same number of bytes, but it won't, because of compression and other encoding factors. That's why 50% of the audio interval ended up encoded in more than 50% of the filesize (plus the fact that every file will have its own headers and other metadata, making them a little bigger on that front as well). Still a cool project! Is there a reason you used Whisper-1 instead of Whisper-3?
@AdamLucek
2 ай бұрын
Ah phenomenal- thanks for the further explanation! Audio manipulation was brand new to me when making this (and is still lol), so I appreciate the analysis. The latest whisper model is pointed to in the api as "whisper-1" so I just stuck with calling it that, but it should still be using the latest version (v2-large) with that!
@IceMetalPunk
2 ай бұрын
@@AdamLucekAh, I see! That's weird they'd call it whisper-1 if it's actually v2 😂 It's also weird that they've released the v3 models open-source, but don't offer them through the API 🤔 I wonder, though, if you'd get any benefit in transcription accuracy by running a v3 model outside of the OpenAPI?
@AdamLucek
2 ай бұрын
@@IceMetalPunk Sounds like good followup experimentation. The currently available model is pretty good all around for what I need- would however like to see some more interesting things with it like what AssemblyAI is doing around things like speaker diarization and audio intelligence
@tzenmatteo
2 ай бұрын
nice one adam
@rafidkhan1036
2 ай бұрын
how about using the yt-transcript module to do the transcription?
@AdamLucek
2 ай бұрын
That's certainly an option- although I found that youtube's auto generated captions tend to be less accurate than Whisper's transcription, with more errors in word recognition/spelling, no punctuation, and have a hard time picking up on different nuances like pauses and breaks. Makes the language model output a little more accurate with these things! And as an aside, wanted to get more familiar with using Whisper through this too!
@mandeadhungry
2 ай бұрын
can't hear every word you're saying

Making YouTube Videos With AI! - How to Use LLM & ATT Models to Make Engaging Content

PulseRead v1.0 Demo - Social Media Comment Analysis using Language Models

Шымкент Шоу 2024 | Жаңа бағдарлама 🤣🤣🤣

ПЕЙ МОЛОКО КАК ФОКУСНИК

Omega Boy Past 3 #funny #viral #comedy

Bungee Jumping With Rope In Beautiful Place|Sky Bridges Are Amazing!#bungee #extreme #high

How 4096 AI’s Trained 11 Axes Humanoid Robot Hand To Finally Do This...

Embeddings: What they are and why they matter

LLMs & AI Benchmarks! - GenAI Eval Deep Dive

I Built Flakes for 21 OCaml Libraries in the OCaml Riot Stack! 🐪 (Part 13 - The Finale)

Lullabies and Whispers with GPT-4o

AI Expert: Here’s how AI Agents 10x my productivity

GPT-4 - How does it work, and how do I build apps with it? - CS50 Tech Talk

Speak Any Language With AI - Realtime Speech-to-Speech Translation & Voice Synthesis (w/Code)

Function Calling Local LLMs!? LLaMa 3 Web Search Agent Breakdown (With Code!)

Simple Introduction to Large Language Models (LLMs)

😱НОУТБУК СОСЕДКИ😱

Эволюция телефонов!

Home Gadgets Haven😘Versatile Utensil (Inventions & Ideas)|Home Gadgets Haven #shorts #viral #tiktok

The power button can never be pressed!!

3.5.A Solar Mobile 📱 Charger

КУПИЛ ПОДДЕЛКУ iMac С WILDBERRIES ЗА 20К - ИГРОВОЙ АЙМАК С WB ЗА 20.000р, ОБЗОР

Unlock the aesthetics of returning home Kaidis smart lock official account

Turn Videos Into Blog Posts With AI! - GPT-4, Whisper-1, and Embedding Model Approach

Пікірлер: 10