Thanks for this video, very helpful getting a local training run going. One question - for me, setting `lr_scheduler_type` to "constant_with_warmup" makes `warmup_ratio` get ignored. Same goes if I switch to `warmup_steps` or `lr_scheduler_kwargs={"num_warmup_steps": XXX}`. Have you had this problem before?
@TrelisResearch
2 ай бұрын
Yeah it seems that constant lr schedulers can block warmups sometimes. That's probably a bug that needs to be reported on the HF github repo under the SFTTrainer. You could hack this by doing a linear LR but with the starting and ending learning rates being very close. (suggestion, I haven't tried this).
@pauloneto8147
4 ай бұрын
Your contents are the best. I learn a lot from them. Thanks.
@kamranhaddadian1707
4 ай бұрын
You can use "auto_find_batch_size=True" in the training arguments instead of manually determining the batch size. This command automatically finds the optimal batch size based on the GPU you are using.
@TrelisResearch
4 ай бұрын
ooh that's a nice tip, I'll have to try that.
@Damon_Sieputovsky
4 ай бұрын
Ronan, maybe you should start doing statistics - number of tokens vs. number of trained parameters- effect. Maybe the laws of scaling apply here, e.g. chinchilla and it is ideal to train 20 tokens for 1 parameter, etc.?
@TrelisResearch
4 ай бұрын
Lemme see if I can make some comments on that next week!
@KopikoArepo
4 ай бұрын
I would love to know if you are interested in a start/craft up…
@TrelisResearch
4 ай бұрын
yeah, Trelis is itself a startup!
@chebkhaled1985
4 ай бұрын
at the end of the day if everything we really need is behind a paid wall ...
@TrelisResearch
4 ай бұрын
What do you mean? The notebook is linked in the description for this video. Cheers
@chebkhaled1985
4 ай бұрын
@@TrelisResearch not talking about this vid specifically. I felt baited many times where everything looks good but then the bits that are really of a value one have to pay for it. I get that this is your business model but it is frustrating on my end because I think the channel (topics and explanations are great)
@TrelisResearch
4 ай бұрын
@@chebkhaled1985 , yeah there are trade offs alright, but my main focus is to keep Trelis growing because people seem to like it, and that means making it a great business. Definitely free and educational content is a key piece and that’s part of my thinking with the livestreams here. Cheers
Пікірлер: 13