Topic Modeling with Llama 2

Рет қаралды 15,461

Maarten Grootendorst

Жүктеу

Пікірлер: 53

@redfield126
10 ай бұрын
It works like a charm on my data. Topic label is now really meaningful and then highly more useful. You made my data Maarteen. Now i need to include those to my embeddings for semantic search and I think I am good to go. thanks a lot. Eager to check your book that is coming (cf. the link in your descriptiopn)
@imyhull4923
10 ай бұрын
Been following BERTopic from the beginning and used it many times along with KeyBERT for work projects and personal projects. Always struggled with the interpretation of topics at the end of the process, but this looks like a great solution. Looking forward to getting your book now. Thanks so much for the tutorial!
@AlokKumar-fi8qh
11 ай бұрын
I have seen all three videos. Loved all. Absolutely Gold
@raymondkusch214
11 ай бұрын
This is great! Thank you for providing this to the community.
@oleksiipanasenko4987
11 ай бұрын
Thank you, Maarten! Your video and explanation are perfect
@ernestosantiesteban6333
7 ай бұрын
Great video. You save my work.
@bermus0101
10 ай бұрын
Really appreciated your works !! thank you !!
@grrdmeester1
7 ай бұрын
Heel goed uitgelegd Maarten. Heel inspirerende video. En geweldig dat ik zelf in Google Colab hands on met jouw voorbeeld aan de gang kan. Heb al wat geprobeerd op mails (alleen nog gebruikmakend van BERTopic, zonder Llama) en resultaat is veelbelovend. Keep up the good work! Ik ben op mijn werk al een ambassadeur van BERTopic
@alane1988
4 ай бұрын
This video is fantastic.
@erick-alfaro
11 ай бұрын
Hi Maarten! I've been following your work for some time and am so happy to see you start a KZitem channel. I am curious how you suggest I apply this (or something similar) to the task of identifying topic timestamps for KZitem videos?
@MaartenGrootendorst
11 ай бұрын
You could use whisper to convert audio into text and feed it to Bertopic: towardsdatascience.com/using-whisper-and-bertopic-to-model-kurzgesagts-videos-7d8a63139bdf
@pegahghadak5867
7 ай бұрын
Great. Thanks for sharing
@rahulkulkarni7224
Ай бұрын
Can you please do a video on LLAMA3.1 for topic modeling and data summary [like agent - customer chat, reviews etc]
@mauritsvanwijland2872
Жыл бұрын
Maarten, great video on how to use your next iteration of Bertopic and the Llama2 model. Your examples are all focused on the english language. I have tried Bertopic with Dutch documents, but it fails to generate good quality topics. Could you make a video on using Dutch or any another language?
@MaartenGrootendorst
Жыл бұрын
That's a great idea! To give you a quick few tips already... using a multi-lingual embedding model is quite important for properly representing another language especially if you use KeyBERTInspired. Another trick is to remove Dutch stopwords using the CountVectorizer. If you combine those tips together with the Best Practices, then that should already give you a head-start: maartengr.github.io/BERTopic/getting_started/best_practices/best_practices.html
@Break_down1
6 ай бұрын
Great work on this ~topic~. I’d be curious, have you tried using fuzzy clustering algorithms for separating topics? It’s likely that documents sometimes contain multiple topics
@AbhishekPandey-tk1it
5 ай бұрын
Great video.
@aaroldaaroldson708
4 ай бұрын
Hi! Unrelated to this video directly, but is there a way to render the visualisation of the clusters in html and not in Jupyter notebook?
@harmpwns
5 ай бұрын
Hi Maarten, does llama also do a good job labeling dutch keywords?
@GB-ot2iv
Ай бұрын
Hi Marteen, great content as always. Would it be possible to make a video on topic distribution? If I've understood well, what BERTopic does is to assign a document to a cluster of documents, hence assign a single topic to a document. What if we want to assign multiple topics? For example, an abstract can talk about sentiment analysis in medical reviews using LLMs so we want to extract at least three main topics: sentiment analysis, medical reviews, and LLMs. How do we do? Your answer would be super appreciated!
@researchKIL
8 ай бұрын
Considering your innovative approach was a great source of inspiration for me, I'm curious about using my own data. Is it sufficient to focus on the 'abstract' column, or would it be beneficial to include a 'title' column as well? I noticed you extracted 'titles' in your example but didn't use them in the training process(I may have overlooked it.). Additionally, the model returned over 100 topics, how can I effectively control the number of topics in the analysis? Thank you again for your contribution.
@jackbauer322
Жыл бұрын
When using Agglomerative Clustering in this workflow, I have a HUGE topic 0 with 99% if the and so on ... like he regrouped most of the documents relative to stopwords ... that only happens with Agglomerative Clustering KMeans Mini Batch is ok ...
@MaartenGrootendorst
Жыл бұрын
Good that you are experimenting with clustering models. As you have noticed, they matter greatly in the construction of the topics. One generally outperforms another. I generally hear good stories about using HDBSCAN, the default clustering algorithm. Even if you do not want the outliers, then there are options for reducing or even removing them: maartengr.github.io/BERTopic/getting_started/outlier_reduction/outlier_reduction.html
@shameekm2146
11 ай бұрын
The query I have regarding this Topic Modelling is Can we use this anywhere in use case of Retrieval Augmented Generation for better fetching of relevant documents and also for better generation of answers?
@MaartenGrootendorst
11 ай бұрын
You could use the constructed topics to categorize the documents that you have. By supplying these documents with additional categories, you can create additional constraints/filters for a RAG-based pipeline. Therefore, instead of having to search through all documents, it will first search the category of the question after which it selects a relevant subset based on the category. There are many more ways you can use BERTopic in RAG but this can work well if you do not have additional metadata.
@shameekm2146
11 ай бұрын
@@MaartenGrootendorst Thankyou so much. I will look into this implementation methods and possibilities.
@gijajoy8524
11 ай бұрын
what if i want the important topics from a single custom document, will it detect
@MaartenGrootendorst
11 ай бұрын
Sure, use approximate_distribution: maartengr.github.io/BERTopic/getting_started/distribution/distribution.html
@dimitripetrenko438
8 ай бұрын
Hi Maarten! I have been using Bertopic since last year it's such a useful tool! When I tried this new LLM technique I ran into a problem where keybert and MMR are working fine, but LLM generated topics are just giving me repeated non-sense words, would you have any idea why? It looks like this [INST] I have a topic that contains the following documents: - How does bekanområområområområområområområområområområområområområområområområområområområområ
@dimitripetrenko438
8 ай бұрын
My bad I was being an idiot, it was a problem with prompting template
@wenqianzhou9174
Жыл бұрын
Please produce more content ! Love BERTopic 💯
@redfield126
10 ай бұрын
I am a big fan of BERTopic and what you are proposing with llama2 looks solving part of my challenges. I dedicate my next night on testing it ! Thanks for all the great work so far and sharing this with the community. You are a Man.
@redfield126
10 ай бұрын
As promised, I am in the train. First thing first, there is no surprise. the tutorial combo with this video, the colab and the dedicated tutroial page is just perfect and educational as usual Maarteen. I like the integration with Llama2 as a new representation model and the possibility to leverage quantization. I was afraid of not being able to run your experiment on my desktop. You made my day allowing using 4bits ! Now the result is really really promising. This is exactly the type of challenge I was facing with previous topics like with KeyBERT. There are interesting but prone to interpretation and question loops with end users. This time, with llama2 I have the feeling we have the flexibility and versatility we need to guide the topics generation as we need. Really elegant implementation. Thank you Sir ! Next step for me is to test on my use case. ! Exciting
@tlerksuthirat
10 ай бұрын
Thank you very much for uploading this video. It is very useful for our research work. Really appreciated your works and dedication :)
@aguntuk10
7 ай бұрын
can we do that with gpt-3.5-turbo
@umit_00
10 ай бұрын
Thanks for the update - really insightful! Is it possible to use a GPT-3.5 API instead of local LLama-2?
@BatBallBites
Жыл бұрын
Perfect, Thanks for this video as I tried so much to get access of your mediam article but was not able to read it because the content was for the premium users only having paid subscriptions, Thanks I was looking for something like this for my solution I will surely try this one
@phoebeyu2566
11 ай бұрын
Hi Maarten! Thank you so much for the great content! One quick question - would you be able to have llama2 label the merged topics when doing hierarchical topic modeling?
@Shubhi021
8 ай бұрын
Thank you for sharing this! Detailed, super informative and very helpful.
@natalietran6382
10 ай бұрын
Thank you for this informative tutorial! It is really easy to understand and I am ready to implement it.
@asifsiddiqui1058
Жыл бұрын
THANK YOU MAARTEN THIS HAS TUTORIAL HAS MADE MY LIFE LOT EASY TO FINISH MY PROJECT SUCCESSFULLY!
@asmaaziz2436
9 ай бұрын
Can we use Llama 2 for german topics?
@harish2985
7 ай бұрын
This is exceptionally useful. thanks a lot !
@yizhouqian5899
7 ай бұрын
Great video. Thank you so much sir!
@Fritz0id
11 ай бұрын
Fascinating! Can’t wait to try this
@wangeesadesilva6321
11 ай бұрын
This is incredible. Fantastic explanation. Thank you so much for the great content! a quick question, if we have only consists of object labels or information about objects detected in the images/video (e.g., "dog," "car," "tree," etc.), can we still use this object label information as input for BERTopic?
@MaartenGrootendorst
11 ай бұрын
With enough documents, I think this should be no problem. Definitely worth trying out!
@FatemehDehghani-k6l
11 ай бұрын
Thanks for this great video. Do you think this can be done with game reviews to detect the most important components of the game?I planned to do that with LDA. However, I came across your video, and I thought that is great do to that with LLM.
@asmaaziz2436
9 ай бұрын
Definitely
@streamocu2929
10 ай бұрын
thx ❤
@MrSuperGerald
Жыл бұрын
Thank you, Marteen! Looking forward to your next videos. Some on federated learning would be great too.
@MaartenGrootendorst
Жыл бұрын
That's a good one! I work a lot with federated LLMs nowadays, so I'll keep it in mind 😀