There is in error in the code at 15:24, 'signature.append(idx)' should be replaced with 'signature.append(i)' - full code example here: gist.github.com/jamescalam/a9d5708ab84aaf92055f8a08e906efba
@MrLazini
10 ай бұрын
This video is excellent in so many ways. Thanks James!
@LamNguyen-hw9lq
3 ай бұрын
You explained much better than my professor!
@skobanemusic5752
Жыл бұрын
Thank you James for your thorough explanations in all your videos.
@Munk-tt6tz
4 ай бұрын
Best explanations as always, thank you!
@Han-ve8uh
Жыл бұрын
1. 15:10 def create_hash_func takes size as input but never used it? 2. What is the hash function used at 21:47? Looks like it never hashsed but directly compared the segmented signatures? That mismatches the visuals at 18:33 which shows 3 hash functions shaded blue red green
@anujlahoty8022
9 ай бұрын
Thanks a lot for this amazing stuff!
@MarsXion
Жыл бұрын
Very helpful, Thank you!
@imlazy007
2 жыл бұрын
Hello @James Briggs: I was curious if it makes sense to use minhash LSH instead of more proven solutions such as solr / elasticsearch for searching same/similar text records. Do you happen to know of any pros and cons of using LSH approach instead of solr? Love your channel, really appreciate it for all the hardwork
@vaibhavkirtankar5336
Жыл бұрын
Amazing explains. Thanks alot
@HungTran-fp3ij
4 ай бұрын
Hello @jamesbriggs. thank you for your tutorial.
@AymenSekhri-gw8wh
Жыл бұрын
It was really helpfull, thank you so much
@charlesc2064
Жыл бұрын
just curious, what's the reason for using a list object instead of set in the "shingle" function at 6:52 ? Thanks!
@mihaelacostea5783
6 ай бұрын
Does this work for semantic similarity? Meaning texts that say the same thing but with different words?
@tiago.engenheiro
Жыл бұрын
why don't u just loop for 'values' within func in the second loop? what are u gaining with looping for (1, len(vocab)+1) to find the index?
@kejdilleshi134
2 жыл бұрын
Hello James, I am implementing LSH but there is a problem in the "signature info" part. In my computer the signature similarities between a,b and b,c are completely random. So the Jaccard (a_sig,b_sig) has no connection with Jacard(a,b) the same for b,c. In my opinion this means that the signature is not representing correctly the sentence. I tried increasing the number of MinHash func however nothing changed. Best, Kejdi.
@jamesbriggs
2 жыл бұрын
If you try with a and b being the same sentence? Also increasing/decreasing the shingle size?
@maryamaziz3841
3 жыл бұрын
Great work 💯
@jamesbriggs
3 жыл бұрын
thanks Maryam!
@heetaelee7873
2 жыл бұрын
May i ask a question? 17:53 Why the Jaccard between a_sig and b_sig (or c_sig and b_sig) is lower than the Jaccard between original a and b? and What it means?
@EmadGohari
2 жыл бұрын
I think this is actually not correct, see this kzitem.info/news/bejne/w4d3v41ugoBzq5w
@loganfoster8681
5 ай бұрын
Appreciate you writing this but really wish you would have done a better explanation of how it works and focused less on building a script. The reliance on references to functions to call makes this useful for people who want to build this exact script or are already very familiar will those functions in python but makes it essentially useless for building an understanding of how lsh works or learning how to make a custom program using lsh
@EmadGohari
2 жыл бұрын
Hey James thanks for great explanation, I think your point at 17:42 (cell 19 in code) is not actually correct. Please check kzitem.info/news/bejne/w4d3v41ugoBzq5w The expected fraction of matching elements in signatures of A, B (expected # matching elements in signs/length of signs) = jaccard of A, B
@PriyanshuSingh-hm4tn
Жыл бұрын
Great.
@RezaJafari-hs7dq
10 ай бұрын
i think you are doing 1 hot encoding in a wrong way, Could you explain more? Thanks
Пікірлер: 28