Low-rank Adaption of Large Language Models Part 2: Simple Fine-tuning with LoRA

Рет қаралды 21,137

In this video, I go over a simple implementation of LoRA for fine-tuning BLOOM 3b on the SQuADv2 dataset for extractive question answering!
LoRA learns low-rank matrix decompositions to slash the costs of training huge language models. It adapts only low-rank factors instead of entire weight matrices, achieving major memory and performance wins.
🔗 LoRA Paper: arxiv.org/pdf/2106.09685.pdf
🔗 Intrinsic Dimensionality Paper: arxiv.org/abs/2012.13255
🔗 Colab: colab.research.google.com/dri...
About me:
Follow me on LinkedIn: / csalexiuk
Check out what I'm working on: getox.ai/

Жүктеу

Пікірлер: 81

@gnostikas
Жыл бұрын
You seem to be the kind of ai expert that I am trying to become. Very impressive.
@LordKelvinThomson
Жыл бұрын
At least as good and at times better that every other equivalent tutorial on the subject at this time.
@chrisalexiuk
Жыл бұрын
Thanks so much for the kind words!
@waynesletcher7470
8 ай бұрын
Oh, Wise Wizard, I bow before your might. Please, continue to guide me.
@andriihorbokon2015
Жыл бұрын
Great video! So much passion, love it.
@datasciencetoday7127
Жыл бұрын
Mind blown into 3 billion pieces
@user-hf3fu2xt2j
Жыл бұрын
Ok, you got me absoutely amused by the results. Also, thanks for showing that there's lora library out there : I tried to do it on my own
@murwanashyakashaffyassouma3014
Жыл бұрын
Hi Chriss! Thanks for the course. i want to learn more. May God bless you 🤲
@danraviv7393
Жыл бұрын
Thanks for the video, it was very useful and clear
@afifaniks
9 ай бұрын
Very intuitive! I didn't even yawn throughout the whole video lol. Keep up the good work! :)
@MasterBrain182
10 ай бұрын
Astonishing content Man 🚀
@television9233
Жыл бұрын
Very Cool Huggingface has done so much of the heavylifting for us, they are actually amazing. Also, when I first heard about LoRa I thought the implementation was complicated (utilizing some efficient SVD or other numerical methods to achieve the decomposition of the full weight update matrix) turns out it literally just starts with the two smaller matrices and backprop does all the work lol
@chrisalexiuk
Жыл бұрын
Backprop coming to the rescue again!
@DreamsAPI
Жыл бұрын
Subscribed and Thumbs up, appreciate the videos.
@tech-talks-with-ali
Жыл бұрын
WoW! You are amazing man!
@ENJI84
Жыл бұрын
Amazing set of videos! Can you please update on the model that is doing text-to-SQL that you've mentioned? This is very important to me :)
@nothing_is_real_0000
Жыл бұрын
Hi Chris! Really thank you so much for such a detailed tutorial. Loved every bit of it. In the time of big corporations trying to monopolise the technology, people like you give hope and knowledge to so many others! Really appreciate it. You've made the lora tutorial easy to understand. Just had a question. I guess you have answered it in someway already, but just wanted to confirm. GPT-2 is somewhat old, so does this method apply to GPT-2 also? I mean can we use GPT-2 model instead of Bloom?
@chrisalexiuk
Жыл бұрын
You can use LoRA with anything that has weight matrices!
@nothing_is_real_0000
Жыл бұрын
Thank you!!!
@yasinyaqoobi
5 ай бұрын
Great video. Wish you showed the comparison against the base model. Just to clarify, we are not able to use the LORA model generated from model A with a different base model?
@akashdeepsoni
10 ай бұрын
Thanks for explaining the implementation in such an easy way. I wanted to play around with this and I used the free tier google colab with TU-GPU and used the smaller "bigscience/bloom-1b7" model. The inference method make_inference(context, question) is giving me below error. Is this because of using the free-tier GPU, though training and all the previous steps were executed without any issues. Would be great if you can shed some light on this ! Error : RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
@honglu679
5 күн бұрын
Thx for great video! so what is the better way to teach a model new knowledge, if FT is somehow only good for structure? thx much!
@ryanbthiesant2307
Жыл бұрын
Who, what, where, why and when. I am grateful for your video. Please can you give a use case. And start your videos with the end in mind.
@chrisalexiuk
Жыл бұрын
Absolutely! I'll seek to do that going forward!
@ryanbthiesant2307
Жыл бұрын
@@chrisalexiuk thanks for not taking offence. I have ASD and ADHD. Super hard to focus without an idea of what you are making and what problem you are trying to solve. apologies for the directness.
@PCPTMCROSBY
10 ай бұрын
trying to get Some people interested in product development and modification but they have requirements for material can't leave the building that means no internet everything has to be done in our machines in house we can't share it with collab or anybody it would be nice if you did more shows related to that subject of keeping complete control material because there are so many people that are just scared to death of breaches
@sarabolouki
8 ай бұрын
Thank you for the great tutorial! How do we set that we only want to fine-tune query_key_value and the rest of the weights are frozen?
@chrisalexiuk
8 ай бұрын
By using the adapter method, you don't need to worry about that! The base model will remain frozen - and you will not train any model layers.
@user-su1lr7by7w
Жыл бұрын
Amazing work. Can you put up something similar for fine-tuning MPT-7B model? I switched the model to MPT-7B but I keep getting this error during training "TypeError: forward() got an unexpected keyword argument 'inputs_embeds'". I am scratching my head but cant seem to figure out what went wrong.
@chrisalexiuk
Жыл бұрын
Sure, i can try and do this!
@vita24733
Жыл бұрын
Hi Chris, the block of code with "model.gradient checkpointing enabled()" which increases stability of model. Have you made any previous videos where I can read and learn about this. If not, are there any resources you would reccomend to understand this.
@chrisalexiuk
Жыл бұрын
Basically, you can think of it this way: As we need to represent tinier and tinier numbers - we need more and more exponents. There are a number of layers which tend toward very tiny numbers, and if we let those layers stay in 4bit/8bit it might have some unintended side-effects. So, we let those layers stay in full precision so as to not encounter those nasty side-effects!
@vita24733
Жыл бұрын
@@chrisalexiuk ohhh ok understood. This was by far the clearest explanation abt this. Thank you!
@maxlgemeinderat9202
7 ай бұрын
Great video! What wpuld be different if i do download the model not on colab but locally? Which lines do change in the code?
@chrisalexiuk
7 ай бұрын
You should be able to largely recreate this process locally - but you would need to `pip install` a few more dependencies. You can find which by looking at what the colab environment has installed - or using a tool like pipreqs!
@alexandria6097
6 ай бұрын
Do you know how much GPU RAM the meta-llama/Llama-2-70b-chat model would take to fine-tune?
@user-wr4yl7tx3w
Жыл бұрын
what is meant by causal language model? I assume it has nothing to do with the separate field of Causal AI.
@chrisalexiuk
Жыл бұрын
A causal language model is a model that predicts the next token in the series. It only looks at tokens on the "left" or "backward" and cannot see future tokens. It's confusing because, as you noted, it has nothing to do with Causal AI.
@yasinyaqoobi
5 ай бұрын
Attempting to run the notbook but I keep getting ValueError: Attempting to unscale FP16 gradients. Tried different colab envs but no luck.
@datasciencetoday7127
Жыл бұрын
hi chris can you make a video on this or give me some pointers? scaling with langchain, how to have multiple sessions with LLM, meaning how to have a server with the LLM and serve to multiple people concurrently. What will be the system requirements to run such a setup. I believe we will be needing kubernetes for the scaling
@chrisalexiuk
Жыл бұрын
You'll definitely need some kind of load balancing/resource balancing. I'll go over some more granular tips/tricks in a video!
@Neuralbench
Жыл бұрын
Hey Chris, Awesome video! Thank you for it. Can you please help me out here. I am using your notebook but when it do the model.push_to_hub then , adapter_config.json and adapter_model.bin are not being uploaded to the hugging face , instead i only see 1. generation_config.json 2. pytorch_model.bin 3. config.json What am i doing wrong here?
@Neuralbench
Жыл бұрын
I figured out the problem , it was this line model = model.merge_and_unload() after the training
@chrisalexiuk
Жыл бұрын
Yes! Sorry, Adil! We are only pushing the actual *LoRA* weights to the hub - and merging the model back will mean that the entire model will be pushed to hub. Great troubleshooting!
@omercelebi2012
Жыл бұрын
Thanks for sharing this tutorial. I get 'IndexError: list index out of range' when reading from hub, I just copied and pasted code, it happens 6th progress bar. Any solution? Model: bloom-1b7
@chrisalexiuk
Жыл бұрын
Could you share with me your notebook so I can determine what the issue is?
@mchaney2003
11 ай бұрын
What are the ways you mentioned to more efficiently teach a model new knowledge rather than new structures?
@chrisalexiuk
11 ай бұрын
You'd be looking at something like continued pre-training. I perhaps misspoke by saying "more efficient", I meant to convey that LoRA might not be the best solution for domain-shifting a model - and so there are more *effective* ways to domain-shift.
@sagardesai1253
11 ай бұрын
informative video, can suggest some GPU compute resource. Aim is to implement the learnings. would like to know cheapest possible resource.
@chrisalexiuk
11 ай бұрын
Lambda Labs has great prices right now, otherwise Colab Pro is an affordable and flexible option.
@Robo-fg3pq
5 ай бұрын
Getting "ValueError: Attempting to unscale FP16 gradients." when running the cell with trainer.train(). Any idea?
@shashankjainm5009
Ай бұрын
Even i'm getting the same error for "bloom-1b7". Did your problem resolved ?
@shaw5698
Жыл бұрын
Sir, Is it possible to share the colab notebook? For Extractive QA, How we will evaluate and compare with other models? Like, EM and F1, how we will implement those and compare with other Bert or llm? models
@chrisalexiuk
Жыл бұрын
Yes, sorry, I will be sure to update the description with the Notebook used in the video!
@shaw5698
Жыл бұрын
@@chrisalexiuk Thank you, it will be very much appreciated.
@prospersteph
Жыл бұрын
@@chrisalexiuk we will appreciate it
@chrisalexiuk
Жыл бұрын
colab.research.google.com/drive/1GzHdbIarvnRee_Ix9bdhx1a1v0_G_eqo?usp=sharing
@davidromero1373
7 ай бұрын
Hi a question, can we use lora to just reduce the size of a model and run inference, or we have to train it always?
@chrisalexiuk
7 ай бұрын
LoRA will not reduce the size of the model during inference. It actually adds a very small amount extra - this is because the memory savings come from reduced number of optimizer states.
@kartikpodugu
8 ай бұрын
Amazing. I tried this on my desktop which has NVIDIA GeForce 3060. And, I was able to run only 6 steps. On windows I wasn't able to run at all as i am facing some issues with bitsandbytes library. Also, I used bloom1b7. But, after doing all the exercise, i see that the output generated doesn't stop after CONTEXT, QUESTION and ANSWER, it keeps generating some text which includes EXAMPLE and so on. Though the notebook adds bitsandbytes at the start using "import bitsandbytes as bnb", bnb is not used anywhere. So, I thought commenting that line out will make my script work on windows, but no, even without the line the script that i wrote mimicking your colab notebook, didn't work on windows. Can you tell me how the notebook depends on bitsandbytes?
@chrisalexiuk
8 ай бұрын
Bitsandbytes is leveraged behind the scenes through the HuggingFace library.
@98f5
7 ай бұрын
any chance you can make an example of fine tuning code llama like this
@chrisalexiuk
7 ай бұрын
I might, yes!
@98f5
7 ай бұрын
@chrisalexiuk itd be greaty appreciated. There is almost no implementation docs or examples around for using lora 😀
@chrism315
Жыл бұрын
The notebook linked doesn't match the one used in the video. Is the notebook in the video available somewhere? Thanks, great video!
@chrisalexiuk
Жыл бұрын
Ah, so sorry! I'll resolve this ASAP.
@chrisalexiuk
Жыл бұрын
I've updated the link - please let me know if it doesn't resolve your issue! Sorry about that!
@user-hf3fu2xt2j
Жыл бұрын
Tried this and it's interesting that 3b/7b1 bloom models perform WORSE on my test questions after this training, than bloom 1b1
@chrisalexiuk
Жыл бұрын
Hmmmm. That's very interesting!
@chrisalexiuk
Жыл бұрын
I wonder specifically why, it would be interesting to know!
@user-hf3fu2xt2j
Жыл бұрын
@@chrisalexiuk I didn't change other parameters though. maybe rank and batch size should be higher for higher param count models
@user-hf3fu2xt2j
Жыл бұрын
@@chrisalexiuk man, it gets more weird now. I tried doing more steps with smaller learning rate, smaller batch size, on a bigger model. It started adding explanation sections and generating, well, explanations. bloom 3b
@gagangayari5981
Жыл бұрын
@@user-hf3fu2xt2j What was the learning rate you were using? Is it the same as mentioned in BLOOM paper? Also what is the current learning rate ?
@ilya6889
10 ай бұрын
Please don't scream 😬
@ArunKumar-bp5lo
8 ай бұрын
facing KeyError: 'h.0.input_layernorm.bias' when downloading from the hub
@chrisalexiuk
8 ай бұрын
Hmmm. Are you using the base notebook?
@ArunKumar-bp5lo
8 ай бұрын
@@chrisalexiuk yeah just changed the model to 1b7
@chrisalexiuk
8 ай бұрын
Could you try adding `device_map="auto"` to your `.from_pretrained()` method? Also, are you using a GPU enabled instance for the Notebook?