DOCETL | ETL for unstructured data

In this recording, I explored DOCETL, an open source package for declarative data processing using the power of LLM. This reminds me of the Hadoop days when I used to write complex Java programs to create input and output formats to find the schema in unstructured data. The approached looked similar but more powerful with Gen AI.
I have modified the code a little to add the youtube parser also in the pipeline. The revise code is in this repo
github.com/raj...
Code used in the video:
_________________________
Extracting the transcript from youtube vide:
import json
from youtube_transcript_api import KZitemTranscriptApi
transcript = KZitemTranscriptApi.get_transcript("dG9zjKpRmdY")
texts = transcript
transcript=""
for text in texts:
transcript = transcript +" " + text["text"]
print(transcript)
json_content = {"transcript":transcript.replace("'","")}
with open("transcript.json","w") as f:
f.write(str(json.dumps(json_content)))
And here is the pipeline_2.yaml for the data processing
datasets:
audio_transcripts:
path: transcript.json
type: file
default_model: gpt-4o-mini
operations:
name: extract_topics
type: map
output:
schema:
topics: list[str]
prompt: |
Analyze the following transcript :
{{ input.transcript }}
Extract and list all key topics mentioned in the transcript.
If no topics are mentioned, return an empty list.
pipeline:
steps:
name: analyze_video
input: audio_transcripts
operations:
extract_topics
output:
type: file
path: audio_topics.json
intermediate_dir: intermediate_results
Reference: ucbepic.github...

Жүктеу

NODES 2023 - Using LLMs to Convert Unstructured Data to Knowledge Graphs

How to Parse JSON Data in C# - Coding Gems

Life hack 😂 Watermelon magic box! #shorts by Leisi Crazy

Миллионер | 1 - серия

НАШЛА ДЕНЬГИ🙀@VERONIKAborsch

小蚂蚁会选到什么呢！#火影忍者 #佐助 #家庭

Database Replication & Sharding Explained

Gen AI Project Using Llama3.1 | End to End Gen AI Project

Airflow for Beginners: Build Amazon books ETL Job in 10 mins

Does Deno 2 really uncomplicate JavaScript?

Viral Video of a Man's Crazy Job Interview

Solving one of PostgreSQL's biggest weaknesses.

This is How I Scrape 99% of Sites

A Natural Language AI (LLM) SQL Database - Could this work?

Use Data Transfer Objects (DTOs) in .NET the Right Way 🚀

A First Look At Ubuntu 24.10 "Oracular Oriole"

Life hack 😂 Watermelon magic box! #shorts by Leisi Crazy

DOCETL | ETL for unstructured data

Пікірлер