Негізгі бет New Tutorial on LLM Quantization w/ QLoRA, GPTQ and Llamacpp, LLama 2

Күн бұрын

New Tutorial on LLM Quantization w/ QLoRA, GPTQ and Llamacpp, LLama 2

Рет қаралды 15,537

LLM Quantization: GPTQ - AutoGPTQ
llama.cpp - ggml.c - GGUL - C++
Compare to HF transformers in 4-bit quantization.
Download Web UI wrappers for your heavily quantized LLM to your local machine (PC, Linux, Apple).
LLM on Apple Hardware, w/ M1, M2 or M3 chip.
Run inference of your LLMs on your local PC, with heavy quantization applied.
Plus: 8 Web UI for GTPQ, llama.cpp or AutoGPTQ, exLLama or GGUF.c
koboldcpp
oobabooga text-generation-webui
ctransformers
lmstudio.ai/
github.com/mar...
github.com/gge...
github.com/rus...
huggingface.co...
github.com/Pan...
cloud.google.c...
huggingface.co...
h2o.ai/platfor...
#quantization
#ai
#webui

Жүктеу

Пікірлер: 22

@jacehua7334
Жыл бұрын
Have been busy with work but it's so great on the weekend to see absolute great content from you like always!
@ctejada-0
Жыл бұрын
Happy to see llama.cpp taking off. Since the beginning of this new wave of AI as a consequence of LLM advancements I've been rooting for llama.cpp as it is (in my opinion) the best approach to enable everyone to have their own LLM and enable a plethora of software solutions (open and closed source) that were never possible before. Thank you for this video focused on it.
@code4AI
Жыл бұрын
Thank you for your comment. Maybe I'll do another video on the latest llamacpp ...
@ViktorFerenczi
Жыл бұрын
Excellent video, as always! Thank you. - It would be nice to have a video comparing AWQ with the quantization methods discussed here.
@code4AI
Жыл бұрын
Activation-aware Weight Quantization (AWQ)? Great idea!
@hoangnam6275
Жыл бұрын
U r the best, best content everyweek
@ChrisBrock-mh8qq
7 ай бұрын
Really Great Videos!
@henkhbit5748
Жыл бұрын
Great explanation of the different quatizations methods. Would be nice if we can compare for example llma2 7b models: normal, qlora 4b, qptq 4b, gguf 4b format with different inference questions with an without RAG...
@AK-ox3mv
6 ай бұрын
What does k mean in q4_km? What's difference between q4 and 4bit? Are they same thing?
@akashkarnatak3014
Жыл бұрын
Okay, so gqtq is a quantization technique and gguf is a format to store quantized weights, can't we quantize a model using gptq algorithm and store it in gguf format and run using llama.cpp?
@junzhengge407
6 ай бұрын
I have the same question😢 need help
@amparoconsuelo9451
Жыл бұрын
Can a subsequent SFT and RTHF with different, additional or lesser contents change the character, improve, or degrade a GPT model?
@spencerfunk6697
7 ай бұрын
need a tutorial on quantizing vision models
@yusufkemaldemir9393
Жыл бұрын
Thanks. Does llama2 cpp 4 bit quantized provide back propagation while running it on m2 MacBook? If yes, do you mind provide ref notebook?
@surajrajendran6528
6 ай бұрын
Quantised models cannot be back-propagated. All training should be done in floating point precision.
@devyanshrastogi
10 ай бұрын
Trust me after 20 seconds of your intro I was about to skip this video 🤣🤣 the intro was terrific (Literally).
@gileneusz
Жыл бұрын
0:08 oh... so maybe I'll watch your next video, sorry....
@code4AI
Жыл бұрын
You are the lucky one ...
@gileneusz
Жыл бұрын
@@code4AI no, no that's just my dream 😢
@ernestoflores3873
4 ай бұрын