Multi-Class Language Classification With BERT in TensorFlow

Рет қаралды 18,600

Chapters for each section of the video (preprocessing, model build, prediction) are in the video timeline.
Transformers have been described as the fourth pillar of deep learning [1], alongside the three big neural net architectures of CNNs, RNNs, and MLPs.
However, from the perspective of natural language processing - transformers are much more than that. Since their introduction in 2017, they've come to dominate a majority of NLP benchmarks - and continue to impress daily.
What I'm saying is, transformers are damn cool. And with libraries like HuggingFace's transformers - it has become too easy to build incredible solutions with them.
So, what's not to love? Incredible performance paired with the ultimate ease-of-use.
In this video, we'll work through building a multi-class classification model using transformers - from start-to-finish.
🤖 70% Discount on the NLP With Transformers in Python course:
bit.ly/3DFvvY5
Medium article:
towardsdatascience.com/multi-...
Free access:
towardsdatascience.com/multi-...
Link to Kaggle video:
• How-to use the Kaggle ...
[1] Fourth Pillar of AI:
ark-invest.com/articles/analy...
00:00 Intro
01:21 Pulling Data
01:47 Preprocessing
14:33 Data Input Pipeline
24:14 Defining Model
33:29 Model Training
35:36 Saving and Loading Models
37:37 Making Predictions

Жүктеу

Пікірлер: 75

@MaryamYassi
11 ай бұрын
I wanted to express my sincere appreciation for your videos on KZitem. They have been immensely helpful to me in my Ph.D. thesis, particularly in understanding how to pre-train using MLM and fine-tune the BERT model. I thoroughly enjoy watching your videos, and they have provided valuable insights and guidance for my research. Thank you for creating such informative and engaging content.
@aditya_01
2 жыл бұрын
The best video regarding how to use bert in tensorflow ,thank u
@anityagangurde5329
2 жыл бұрын
Thank you so much!! I was really stuck with the prediction part for a very long time. This will help me a lot.
@chrisp.784
2 жыл бұрын
Thank you so much sir! Best video ever seen on KZitem, clearly explain each steps.
@meredithhurston
2 жыл бұрын
Thanks so much, James. On my 1st attempt I was able to get to ~51% accuracy. I will need to make some tweaks, but I'm so excited about this! Woohoo!
@tildo64
3 ай бұрын
I don't comment on videos, but your video is so clear and easy to understand I had to just say thank you! I have been trying to solve a multi class problem with an LLM for months without significant progress. Using your video, I was able to make more progress by training a BERT model in a few days than I have in months! Please keep posting. It's immensely helpful for the rest of us.
@achrafoukouhou1016
3 жыл бұрын
This video is excellent sir, I was looking for video like that in 2 straight days.
@jamesbriggs
3 жыл бұрын
That's awesome to hear, happy you found it, thanks!
@krishnanvs5946
2 жыл бұрын
Very crisp and nicely structured, with the objective of the exercise stated right at the start
@jamesbriggs
2 жыл бұрын
thanks, useful to know stating the objective helps!
@kennethnavarro3496
2 жыл бұрын
Thank you so much for this tutorial. Most tutorials really piss me off because they always refer back to other videos they made regarding why things work but you explained each step as you did it and this is super good for someone with a temperant like mine. Appreciate it, you're a beast!
@jamesbriggs
2 жыл бұрын
haha thanks Kenneth, I try to assume we're starting at the start for every video :)
@alexaskills3447
2 жыл бұрын
This was great! One question what if you wanted to use additional features besides the Bert embeddings in the training data set. What would be the best approach? Do some type of model stacking where you take the output of the sentiment model and use that combined with other features as input to another model? Or is the a better way to merge/concatenate the additional features onto the BERT word vector training data?
@luiscao7241
2 жыл бұрын
Great tutorial! Thanks
@meylyssa3666
3 жыл бұрын
Great tutorial, like always, thanks!
@jamesbriggs
3 жыл бұрын
Thanks I appreciate these comments a lot! :)
@adityanjsg99
2 жыл бұрын
This video helped thanks. Usage of BERT does need a GPU subscription though.
@serhatkalkan2339
2 жыл бұрын
Great! Tutorial! I wonder if the seq_length has to be that long if we work with short phrases ?
@dhivyasubburaman8828
2 жыл бұрын
Really good tutorial! Thank you so much an awsome teacher.....made the model understanding easy and simple,is there any similar tutorial for bertformultilabelsequenceclassification .....or the same code can be used for mulilabel classification
@jamesbriggs
2 жыл бұрын
Thanks! You should be able to use the same code, just change the output layer dimensions to align with your new number of output labels :)
@agahyucel4502
2 жыл бұрын
hi, first of all thank you for this nice video. How can we make a confisition matrix and classification report here?
@luiscao7241
2 жыл бұрын
Hi James Briggs, I found that following the way of dividing validation/train data, validation and train sets vary all the time. When I save the trained model and load it to evaluate for validation data again, I got different results for each run time. Should I divide train/and validation data from beginning and do not need to use SPLIT = 0.9 or others? does it compromise the accuracy of the trained model? Thanks
@plashless3406
10 ай бұрын
This is awesome.
@manuadd192
Жыл бұрын
Hey Great Video, just got a question in my data set some texts have multiple lables. Can i just set multiple lables to 1 in the labels[] array at 13:47?
@simonlindgren
2 жыл бұрын
This is a fantastic tutorial! Excellent stuff, even for non experts. I wonder how one would go about should one want to add (domain specific) tokens to the BERT tokenizer, before training. Where in the workflow can that be done?
@jamesbriggs
2 жыл бұрын
Hi Simon, there are two approaches, you train from scratch (obviously this takes some time) OR you can add tokens, I want to cover this soon but here's an example github.com/huggingface/transformers/issues/1413#issuecomment-538083512
@simonlindgren
2 жыл бұрын
@@jamesbriggs Great! So add tokens to tokenizer before training on the labeled data, right?
@gloriaabuka5644
2 жыл бұрын
Thank you for this very explanatory video. I tried following along with another dataset but each time I try to one-hot-encode my labels with these 3 lines of code arr = df['rating'].values labels = np.zeros((num_samples, arr.max()))#(my label values are from 1-10) labels[np.arange(num_samples), arr] = 1 numpy.float64' object cannot be interpreted as an integer".
@maxhuttmann4760
Жыл бұрын
James thank you! I had stuck before with extracting Bert embedding for tf layer as now almost everyone shows this part with use of other libraries like tensor flow hub, text etc. and I cannot use them in my project due to limitations Will try your algorithm. Thanks a lot
@jamesbriggs
Жыл бұрын
Glad it helps!
@marwamiimi1935
2 жыл бұрын
Hello, thank you for this great video I follow the steps but i have error Can you help me please ?
@faressayah9897
3 жыл бұрын
Amazing tutorial 👏👏👏. If you are going to use your model on another machine it's better to h5 format. # Saving the model model = model.save("your_model.h5") # Loading the model in another machine import tensorflow as tf import transformers model = tf.keras.models.load_model('your_model.h5', custom_objects={'CustomMetric':transformers.TFBertMainLayer})
@jamesbriggs
3 жыл бұрын
hey Fares, thanks and appreciate the info - I assume you recommend so due to us then only having a single file to transfer - rather than several?
@faressayah9897
3 жыл бұрын
@@jamesbriggs I am working on a hate speech detection project, I trained the model on kaggle and after saving it, it worked in the same notebook but in my local machine it didn't. saving directly need to save the configuration also. I didn't find how to so, so I save the model to h5 format.
@salmanshaikh4866
2 жыл бұрын
Hi there, I am trying to generate a confusion matrix, but due to the dataset being shuffled I'm not able to, and it's giving me random values. Any ideas what to do? (The accuracy and loss is pretty good whilst training the model)
@Moxgusa
2 жыл бұрын
Hi James, first of all good tutoriel ! I tried implementing the same architecture with a different dataset but the model training time is insane it's +50h do you have any clue of the reason it takes so much time ? thank you !
@jamesbriggs
2 жыл бұрын
it can be a long time, it will depend on the hardware setup you have, I'm using a 3090 GPU so it is reasonably fast, I would double check that you are using GPU (if you have a compatible GPU). If you search something like 'tensorflow GPU setup' you should find some good explanations - hope that helps!
@asimsultan8191
3 жыл бұрын
Thank you for such an amazing collection:) Just 1 question, While loading the model, I get this error: ValueError: Cannot assign to variable bert/embeddings/token_type_embeddings/embeddings:0 due to variable shape (2, 768) and value shape (512, 768) are incompatible. Can you let me know why is that so? Thank you so much in advance.
@jamesbriggs
3 жыл бұрын
Hey Asim, I would double check that you are tokenizing everything correctly, the 512 that you see is the standard number of tokens consumed by BERT, which we set when encoding our text with the tokenizer :)
@asimsultan8191
3 жыл бұрын
@@jamesbriggs I got it and solved the problem. Thank you so much :)
@panophobia8527
2 жыл бұрын
After training I get around 60% accuracy. When I try to predict I never get the model to predict Sentiment 0 or 4. Do you have any idea why the model has problems with these?
@MdSaeemHossainShanto
Жыл бұрын
at 42:00 on cell 9, it returns an array of what? What those numbers mean ?
@henkhbit5748
3 жыл бұрын
Nice example! Could u also use the same technique if you want to classify text in more than 5 categories, for example 10 or 20? And each class is not perfectly balanced and it is NOT an englist text? 😉
@jamesbriggs
3 жыл бұрын
haha yes you could, you have different language BERT models that are pretrained - if there was not the language you wanted, we'd want to train from scratch on the new language (mentioned in the last comment) - as for training with more categories, yes we could do that using the same code we use here, we just switch our training data to the new 10-20 class data, and update classifier layer output size to match :)
@minhajulislamchowdhury1101
2 жыл бұрын
how can i find confusion matrix for this kind of dataset?
@gokulgupta1021
3 жыл бұрын
Nice informative video. It would be nice if you can help me to know how can I change this to pytorch # create the dataset object dataset = tf.data.Dataset.from_tensor_slices((Xids, Xmask, labels)) def map_func(input_ids, masks, labels): # we convert our three-item tuple into a two-item tuple where the input item is a dictionary return {'input_ids': input_ids, 'attention_mask': masks}, labels # then we use the dataset map method to apply this transformation dataset = dataset.map(map_func)
@jamesbriggs
3 жыл бұрын
I'm not using PyTorch for sentiment analysis in this example, instead for masked language modeling, but the dataset build logic is very similar, this video at ~14:57: kzitem.info/news/bejne/s2yeyayDhoGjg3o
@gloriaabuka9129
2 жыл бұрын
Thank you for this great video. I tried following along with another dataset but each time I try to one-hot-encode my labels I keep getting an error that says " numpy.float64 object cannot be interpreted as an interger". Any idea how to fix this? Thank you.
@abAbhi105
2 жыл бұрын
same here did you find any solution ?
@gloriaabuka9129
2 жыл бұрын
@@abAbhi105 Yes, I did. I casted my array elements to integer. arr = arr.astype(int) Labels[np.arange(num_samples), arr-1] = 1.
@datascientist7802
2 жыл бұрын
HI Sir, great explanation, and I followed to implement the same, but I got this error when training the model : InvalidArgumentError: Data type mismatch at component 0: expected double but got int32. [[node IteratorGetNext (defined at :1) ]] [Op:__inference_train_function_20701]
@jamesbriggs
2 жыл бұрын
seems like one of the datatypes for (probably) your inputs is wrong, you will need to add something like dtype=float32 to your input layer definitions OR it may be that your data must be converted to float first before being processed by the model
@abhishekchack8065
2 жыл бұрын
Xids= np.float64(Xids) Xmask=a= np.float64(Xmask) dataset = tf.data.Dataset.from_tensor_slices((Xids, Xmask, labels)) before creating pipeline just convert Xids and Xmask to float64
@harveenchadha
3 жыл бұрын
Excellent! Where can I fnd the code used in the video?
@jamesbriggs
3 жыл бұрын
Code is split between a few different notebooks on Github - they're all in this repo folder: github.com/jamescalam/transformers/tree/main/course/project_build_tf_sentiment_model - hope it helps :)
@harveenchadha
3 жыл бұрын
@@jamesbriggs Thanks. That surely helps! Keep up the good work James, I see you are working on a Transformers course. Will be looking forward to it!
@lasimanazrin6212
Жыл бұрын
Getting this error: Unknown layer: Custom>TFBertMainLayer. Please ensure this object is passed to the `custom_objects` Anybody have any idea?
@digvijayyadav4168
2 жыл бұрын
Hi there, please can you share the notebook?
@jamesbriggs
2 жыл бұрын
Hey it's not necessarily exactly the same, but you will find very similar code here github.com/jamescalam/transformers/tree/main/course/project_build_tf_sentiment_model
@amitjaiswar8593
3 жыл бұрын
Its an implementation or fine-tuning? #model.layers[2].trainable = False
@jamesbriggs
3 жыл бұрын
hey Amit, this sets the internal BERT layers to not train, but still allows us to train the classifier layers (which are layers 3, 4, etc), we can actually train the BERT layer too by removing that line, but training time will be much longer
@soysasu
2 жыл бұрын
Hi sir, I'm trying step by step at Google Colab but it running out of RAM. They give me 12.69GB; in the most cases that happens due code problems. Any idea? thank you!
@jamesbriggs
2 жыл бұрын
Google Colab can be difficult with the amount of memory you're given, transformers use *a lot* - one thing that can help is loading your data in batches (so you're not storing it all in memory), one of my recent videos covers this, it might help: kzitem.info/news/bejne/02Owt4Vnb6mFdnY
@soysasu
2 жыл бұрын
@@jamesbriggs Okay, I'll see it. Thank you!
@faisalq4092
Жыл бұрын
I want something from scratch
@Mrwheelsful
3 жыл бұрын
Hi James, at the very end when you predicted your new sentiment data with your model you assigned it to. probs = model.predict(test) I would like to know how to export that data you predicted into CSV format so that one can submit it on Kaggle. test['sentiment'] = model.predict(test['phrase']) submission = test[['tweetid', 'sentiment']] submission.to_csv('bertmodel.csv',index=False) Is this the correct way of going about it :) because I want it in sentiment values when exported.
@jamesbriggs
3 жыл бұрын
I think you might need to perform a np.argmax() operation on the model.predict output, to convert from output logits to predicted labels, but otherwise it looks good :)
@vidopulos
2 жыл бұрын
Hi. Excellent tutorial! I have a problem. When I'm trying to replicate your code and in the part when I'm using tokenizer.encode_plus() and i get ValueError: could not broadcast input array from shape (15) into shape (512)Thanks. It says that the error is here - Xids[i, :] = tokens['input_ids']
@jamesbriggs
2 жыл бұрын
Does it work if you write Xids[:, i] = tokens['input_ids']? Otherwise, double-check the Xids dimensionality with Xids.shape and make sure it lines up to what we would expect (eg num_samples and 512)
@francesniu1201
2 жыл бұрын
I had the same issue, and I solved it by putting pad_to_max_length = True instead of padding = 'max_length'.
@madhavimourya1157
3 жыл бұрын
HI James, great explanation and I followed to implement the same , but I got this error : InvalidArgumentError: indices[2,2] = 29200 is not in [0, 28996) [[node model/bert/embeddings/Gather (defined at /usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_tf_bert.py:188) ]] [Op:__inference_train_function_488497] I know it's related to embedding token id. can u help me how can I resolve this ?
@madhavimourya1157
3 жыл бұрын
Luckily, I got the solution :)
@jamesbriggs
3 жыл бұрын
@@madhavimourya1157 Oh good to hear, was it in your dataset definition?