The Algorithm Behind Spell Checkers

Рет қаралды 353,788

🖥️ GitHub: github.com/b001io/wagner-fischer
⭐ Join my Patreon: / b001io
💬 Discord: / discord
🐦 Follow me on Twitter: / b001io
🔗 More links: linktr.ee/b001io

Жүктеу

Пікірлер: 267

@e2myearly288
3 ай бұрын
This video wouldve been super helpful 3 years ago in college. A professor had us make a spellchecker. It didnt not go well
@NoGentle
3 ай бұрын
serious question: when you gotta make a program or a piece of code, whatever. how "original" it needs to be until it is acceptable to you? i mean, how many lines of code you can copy without have a guilty conscience? (not literally, but you get it, i think) also having in mind that you don't know that specific algorithm but you got to do it anyway
@GetUrFunnyUp
3 ай бұрын
@@NoGentleIt relally does depend on what you are trying to achive If it's for learning purpopses why would you use someone elses solution to a problem, why not make it yourself? that implies that by copying you mean literally copying the code line by line but if by copying you mean that someone just has the idea of the solution to it you solve x by doing z thing and y thing you still have to code z and y thing even tough you know in what way you should i think these are what you call patents
@raymondarrington5339
3 ай бұрын
@@NoGentleI would say that if you are programming something and you copy code because you know it but don’t want to type it all out then it’s fine. Alternatively, you could also copy code to try and pick it apart and learn it better. There are no rules though, so do what you think is best for your situation
@krolmuch
3 ай бұрын
it wouldn't help you at all... you can't do basic research
@e2myearly288
3 ай бұрын
@@krolmuch bro what's with the attack? I'm a visual learner. I struggle to read. Video education is just easier for me to understand. It was mostly a joke anyway.
@bhushanlaware
3 ай бұрын
It took 20 years to solve the Edit Distance problem for the first time, but they want us to solve it in 1 hour of interview.
@seanpe8474
3 ай бұрын
honestly this does open some interesting philosophical ideas about how genius solutions and algorithms come to be. the best ideas are those that even though took a while to come up with, are comparatively easy to teach after they've been discovered.
@andrewjknott
3 ай бұрын
Excellent explanation. Modern spell checkers also use other techniques. One is transposition because that is one of teh most common spelling mistakes. Another is nearness of letters on the keyboard because people can mistype letters that are clise to each other.
@haidaralhassan4621
3 ай бұрын
excellent showcasing of tranpsosition and nearnesd
@awesomedude3247
3 ай бұрын
it seems like the modern ones bridged the difference betweens actual spelling errors and what we might call typos
@arandomguy9669
3 ай бұрын
I see whay yuo did there
@haidaralhassan4621
3 ай бұрын
@@arandomguy9669 hwat a mitzure
@ScienceSuds
3 ай бұрын
This algorithm actually paved the way for a lot of modern bioinformatics algorithms used to align two DNA sequences together, some of the most famous being Smith-Waterman and Needleman-Wunsch! It’s so cool to see the overlap!
@aswinsnair1702
3 ай бұрын
do you know where i could find more about bioinformatics algorithms?
@geekzombie8795
3 ай бұрын
@@aswinsnair1702^^
@ThePituLegend
3 ай бұрын
@@aswinsnair1702try look up the terms that @ScienceSuds commented in Google Scholar, as well as terms such as "Sequence Alignment". There's really a ton of work in this field!
@jenithmehta9603
3 ай бұрын
@@aswinsnair1702 search for fasta and blasta methods
@mantacid1221
3 ай бұрын
Oh yeah don’t biologists check for mutations and differences in a genome by pasting it into word and spellchecking it when the original is in the spellcheckers dictionary?
@portalwalker_
3 ай бұрын
I always thought spellcheckers would incorporate the keyboard layout into their suggestions, as in correcting "worls" to "world", because s is one key away from d
@natescode
3 ай бұрын
I'm sure some do.
@jacksondeane1629
3 ай бұрын
Same! I’m always like “why can’t you tell that I just missed one letter!!!”
@stt.9433
3 ай бұрын
Keep in mind many different keyboard layouts exist. You could also have a case where a written file is OCR'd in which case that wouldn't be relevant.
@seanewing204
3 ай бұрын
This and parts of speech. Maybe track the most common errors based on vocab and document length, like the KZitem algorithm recommending videos based on age, gender, etc.
@ArchiWorldRuS
3 ай бұрын
Of course it does now. There is a video from Enrico Tartarotti released recently "The LIES That Make Your Tech ACTUALLY Work" where you can learn more about your idea and how it is implemented!
@thekwoka4707
3 ай бұрын
What is an interesting addition to the algorithm is actually providing a list of the changes between the two, like for a typewriter.
@maker0824
3 ай бұрын
I read this like 7 times and I can’t tell what you are trying to say
@phiscz
3 ай бұрын
@@maker0824 not sure what they meant w the typewriter mention but i read this as to mean implementing something that provides a diff-like output (it being character-by character instead of by line tho)
@ToddVanyo
3 ай бұрын
Would have loved KZitem 30 yrs ago. In my day-yeah, I’m old-I had a class in which the last assignment was an assembly program for the intel 8086 that implemented a spellchecker. Prof said it would take 40 hrs if we knew what we were doing. No mention of Levenstein, Gorin, or any known algos. I took a 0, as I was behind in other things.
@samueljehanno
3 ай бұрын
That's insane
@tomchapman128
3 ай бұрын
You should absolutely make more videos like this! You're extremely good at explaining things and this video was genuinely so interesting. Well done :)
@anirudhbakare3547
3 ай бұрын
Love your work ❤. Make a series on Programming Algorithms 🙌
@stacklysm
3 ай бұрын
It's very nice to discover dev channels with quality content and interesting topics, keep up the good work!
@birch_tacos
3 ай бұрын
that was the best explanation of Dynamic programing ive ever heard
@Simplified-Script-Development
3 ай бұрын
i like your videos, because it dive deep in tiny important stuf which realy helps a lot
@justinmayhew6848
3 ай бұрын
This was a great video, this explanation was made so intuitive and I have wondered in the past how spell checkers work
@felixstuber8046
3 ай бұрын
Levenshtein algorithm can also be extended to calculate the Damerau-Levenshtein distance. Simply put, this means that you get another operation that switches two neighbouring letters. E.g. the words "world" and "wordl" have Levenshtein distance 2, but Damerau-Levenshtein distance 1 since it is enough to switch the last 2 letters. Especially in keyboard typing, such errors are common. It is also possible to fine tune even more by giving weights different from 1 to the operations.
@stt.9433
3 ай бұрын
Essentially what they're proposing is that the weight of of 2 substitution actually has the weight of 1 substitution. And this goes into a much deeper topic which is that realistically the weight between each mistake(insertion, deletion or substitution) is actually not equal. There are things like phonetic mistakes where two similarly phonetic letters are interchanged, happens a lot in French for example. Common spelling errors have roots and generally it's because of phonetics, double consonants sound the same as single consonants etc... In the Deep Learning approach, you could build a model which would in fact be able to extract these features including not by limited to insertion, deletion, substitution, phonetic difference, common spelling mistakes that would determine the true distance between spelling errors and better determine your intent when writing.
@marcusaurelius6607
3 ай бұрын
@@stt.9433It’s not a proposal, it’s an algo from 1965, a base for all search engines. levenstein distance in pure form is very insufficient for real applications and been used since forever without any need for ML (doh)
@daveys
3 ай бұрын
I used a Levenshtein program to match software names and categories to a list that I scraped from somewhere or other. Worked nicely, but took a while to run, even running on 12 cores because it was running 140,000 unsorted items against 40,000 items with a category and type. Still, 5mins isn’t bad compared to how long it’d have taken to do it manually.
@OPTechpure
3 ай бұрын
im using it in my application to actively read text boxes and compare them to a script im using.
@andrejvujic
3 ай бұрын
Great video! You should make more where you explain interesting algortihms. Maybe you can do Bresenham’s line drawing algorithm next. Keep it up. 😃
@kellybmackenzie
29 күн бұрын
This is awesome, I learned a lot, thank you! I heard that some spell checkers use tries (prefix trees) for better auto-completion. I'd love to see a video on those as well, I adore your way of explaining!!
@pemessh
3 ай бұрын
Oh my god! This was so simple to understand. Thank you so much. Please keep these coming :)
@TeamDman
2 ай бұрын
Thank you for this! The visuals are great!
@mikeygduv
3 ай бұрын
Great video! Love the username. I will definitely be watching more and subscribed. I'm a python novice and don't code but love to see how the sausage is made. Maybe one day I'll get into the sausage biz.
@jemandev
3 ай бұрын
I would love to see more of these history of algorithms videos.
@FriendlyCodeBuddy
2 ай бұрын
Great video! Thanks for the excellent explanation. I found it really friendly and easy to understand.
@egelbets
3 ай бұрын
Wow this is also what i learned in uni but then in the context of DNA sequences because they can also have deletions, insertions, and substitutions (i studied bioinformatics)
@VFPn96kQT
3 ай бұрын
Although I was familiar with the algorithms presented in the video visualizations were great and helped to understand them much better. Thank you.
@robharwood3538
2 ай бұрын
Great video! Thanks for making it! I would love to see you expand this video/topic to include the use of different types of edits having different probabilities and/or 'costs', which is a useful and interesting application for things like calculating the 'distance' between two things which have different physical/theoretical processes for causing different kinds of edits. For example, in DNA sequences, nucleotide substitutions might be much more common than deletions or insertions. And perhaps deletions are more common than insertions -- or vice versa. One way to model this is to have less-common types of edits 'cost' more than more-common ones. Another way to model this is to go by actual probabilities (aka likelihoods). There are algorithms which incorporate such ideas, and can be solved in a similar way to the Wagner & Fischer method, but unfortunately I can't recall the name(s) off the top of my head. But still it is both a really interesting question with really interesting and instructive solutions, so IMHO I think it would make a great topic for a follow-up video. What do you think? Cheers!
@Tom-lz9pu
3 ай бұрын
As others also pointed out this is so similar to needleman Wunsch and or Smith-Waterman, and its insane to me. I learned about bioinformatics algorithms and now end up in a situation in which I can think of sequence aligning being responsible for my spellchecking
@thekwoka4707
3 ай бұрын
levenshtein distances is basically a pathfinding algorithm.
@Zaary
3 ай бұрын
what??? its not even remotely close to that
@chesstipsandtricks420
3 ай бұрын
@@Zaary i agree
@NachitenRemix
3 ай бұрын
Yes, its working out the unknown path (there could be more than one) from one word to another, thats true.
@TheHDreality
3 ай бұрын
For anyone confused he's saying that because the levenshtein distance is considered a "metric space" Which basically means that if you imagine all strings as points in space, that the levenshtein distance works much the same as distance in real space. It sounds kind of meaningless at first but if you use it that way it actually unlocks certain properties of strings that enable some other clever algorithms for searching text.
@Zaary
3 ай бұрын
pathfinding makes it possible to backtrack, this does not, it has only 1 thing in common with patfinding - finding the shortest path, this alghoritm however works completely different from pathfinding and has nothing in common with it.
@garancegourdel5681
3 ай бұрын
Nice video, very pedagogical, if you ever get tempted to make a follow up there is an optimization where instead of computing the entire matrix you only compute the distance bellow a threshold d, this corresponds to computing a wide diagonal in the middle of the dynamic programming matrix.
@nexcode_ir
18 күн бұрын
It was extremely wonderful. Thanks for your great explanations 😍
@CartoType
3 ай бұрын
Two points: it’s actually the Damerau-Levenshtein algorithm; and the implementation given is O of n^2, which is unnecessary. You can use a moving window into the grid that is a diagonal stripe wide enough to hold the maximum acceptable edit distance. That makes the algorithm O of n.
@CartoType
3 ай бұрын
I meant that the commonly used algorithm is Damerau-Levenshtein.
@54peace
3 ай бұрын
I saw "spell checker alogrithm" I subbed. thanks for the video and hoping to see moreeee!❤
@TonyTrippier
2 ай бұрын
6:36 Wagner-Fischer algorithm looks a lot like NeedleMan-Wunsch algorithm(it also a dynamic programming algorithm that is used for alignment of nucleotide, protein and other genetic sequences). It’s possibly the same algorithm but repurposed for alignment in genetic sequences.
@Carberra
3 ай бұрын
I've done some word with Levenshtein distances in the past, but it's cool to see what's actually going on under the hood. Thanks for this!
@kunalsoni7681
3 ай бұрын
Wow this video is amazing and now I have learnt the core concept behind the spell check :)
@ekardonsenior2894
29 күн бұрын
i was hoping to learn about the modern algortihms, but well, now i know the history behind it. hope to see a part 2
@dacixn
3 ай бұрын
Great explanation, and your voice is pretty soothing
@Leet.Time.
3 ай бұрын
What an exceptionally good and well researched video
@juxtopposed
3 ай бұрын
This is very interesting! Great video.
@bensadik
3 ай бұрын
Thank you so much for this video!
@ralphvirtucio4328
3 ай бұрын
This was a great video, MAKE ANOTHER ONEEEE !❤❤❤
@mico3454
3 ай бұрын
Can you upload DSA contents with visualizations? It would really help. Enjoyed this video, will try implementing it myself.
@dr7049
3 ай бұрын
Amazing video! Well done!
@frankkevy
2 ай бұрын
Just amazing explanation
@Vortex-qb2se
Ай бұрын
Omg could this be a channel about algorithms? 🤩
@ebol08
3 ай бұрын
Wonderful video!
@zCodeCAE
3 ай бұрын
The matrix is similar to an action table used to determine the symmetry of a group in accordance to an operation. Basically the math of dp which is you think of it is fractal
@LukasSmith827
3 ай бұрын
really helpful ngl, didn't know much about spell checkers! but now i understand we really need NN's in this area because of how bad the functions are
@smallant.
3 ай бұрын
I thought it'd be obvious to incrporate the distance between two letters on a keyboard into the calculation but I was surprised that after so many iterations its still not there!!!
@linusschonrath956
3 ай бұрын
i really thought it was just ai, didnt realize it was already this old, good video quality!
@justblue6922
2 ай бұрын
Amazing video extremely interesting, simple and high quality
@pmccarthy001
3 ай бұрын
That's very interesting. I'm wondering if perhaps this algorithm could be implemented more efficiently in an array programming language like APL or J?
@dasten123
3 ай бұрын
This was awesome!! I think I just found an awesome new channel :D
@tylerbakeman
3 ай бұрын
One thing I’ve always wondered is how with find the distance between strings from a dictionary- if the strings contain a close substring, starting at an ambiguous index. It’s not very intuitive, but thanks for the video.
@BestPlacesTo_
3 ай бұрын
Very good video, but I guess using a binary search tree on a pre-sorted list for words is more efficient which would make a worst case of roughly O(log n) in the above example. It will also perform both checking if the word is correct or not and finding the suggested words by traversing through the tree only one time. correct me if i'm wrong please
@bigjamar
3 ай бұрын
excelente !!! muchas gracias !!!
@npip99
3 ай бұрын
Best way to teach Dynamic Programming is just simple hashmap memoization of the recursive function, and only teaching the 2D matrix after solving multiple DP problems with memoization.
@praneethsaitunuguntla7751
3 ай бұрын
super informative .. thanksss
@samgoel4283
3 ай бұрын
This algorithm is very similar to one you use in finding longest common subsequence between 2 strings a very popular LeetCode question
@Alexander-zt9kz
3 ай бұрын
As well as Edit Distance ( same as LCS ) - This is the first thing I noticed when he started explaining the video, as it felt as if I had solved this before.
@mohdmajid4309
3 ай бұрын
00:01 Spellcheckers rely on a sophisticated algorithm for accuracy 01:33 The Lenin distance algorithm was crucial for enhancing spell checkers. 03:10 The algorithm follows guard clauses and recursive comparisons. 04:54 Lenin distance algorithm is not practical due to its recursive nature 06:37 Wagner-Fischer algorithm uses dynamic programming for efficient spell checking. 08:23 Explanation of operations involved in transforming strings. 10:02 Wagner Fisher approach calculates edit distance efficiently 11:42 Spell checkers use edit distance to suggest correct words. Crafted by Merlin AI.
@michaeldula462
3 ай бұрын
not only is he a communist, he's also a computer scientist! crafted by a meatbag
@maaxxaam
3 ай бұрын
Oh no, communists are back to destroy computer science with the Lenin algorithm 😂
@michaeldula462
3 ай бұрын
interesting, my quip about the Lenin distance is deleted? Did I offend a communist?
@Will_of_Iron
3 ай бұрын
@@michaeldula462I guess KZitem took it personally lol
@lucasalvesdossantos3993
16 күн бұрын
Your explanation helped me a lot! But i think that I identificate a little mismatch in your explanation: i think that when m[0][j] == m[i][0] we should to copy the value in m[i-][j-1] instead of select the minimun value of the three neighbors positions. In some tests your method works, but sometimes it fails. Sorry for my english...
@Chris-cx6wl
3 ай бұрын
Algorithms with historical context videos are the best.
@Pedritox0953
3 ай бұрын
Great video!
@noahprentice751
3 ай бұрын
great video!
@BenjaminBobkin
3 ай бұрын
Great video. How did you create thise animations. Did you use manim or something else, or have you done it with after effects. Im just curious
@nikhilweee
3 ай бұрын
I'm curious too!
@TesIaNikola
3 ай бұрын
I tried thinking of a way to check against a dictionary faster. while Levenshtein distance is computable in O(nm), using it repeatedly would lead to O(nmk) if the dictionary has k words. The string space sort of behaves like a metric space, with stuff like the triangle inequality. I believe in computational geometry we know how to efficiently find the “k nearest neighbors” in Euclidean space, but Idk how to do that for the space of strings . I was curious if there’s a way to use Levenshtein distance smartly to only perform something like log k queries. If that were possible, the running time would effectively be O(log k) since the lengths of individual words are much smaller than the length of an entire dictionary.
@juxuanu
3 ай бұрын
Whenever I see matrices, I think GPU. GPU accelerated spell checker?
@MorRobots
2 ай бұрын
I wrote this version of the Levenshtein formula in C just now. It's recursive, however I optimized the two length checks so they only happen once and the we just increment the length value down as we increment the string pointer up. /*Levenshtein distance formula*/ #define min(a,b) ((a < b) ? a : b) int _lev(char *s1,int sl1, char *s2, int sl2); int lev(char *s1,char *s2) { int sl1 = strlen(s1); int sl2 = strlen(s2); return _lev(s1,sl1,s2,sl2); } int _lev(char *s1,int sl1,char *s2,int sl2) { if (sl2 == 0) return sl1; if (sl1 == 0) return sl2; if (s1[0] == s2[0]) return _lev(s1+1,sl1 - 1,s2+1, sl2 - 1); int a = _lev((s1+1),sl1-1,s2,sl2); int b = _lev(s1,sl1, (s2+1),sl2-1); int c = _lev(s1+1, sl1-1,s2+1,sl2 - 1); return 1 + min(min(a,b),c); }
@prestonhall5171
3 ай бұрын
Dynamic Programming is one of the coolest design techniques in computer science. First time I learned it I was amazed. Cudos to the Richard Bellman who first developed the idea for it
@adimascahyaning9202
3 ай бұрын
This algorithm is also used in the field of bioinformatics, to solve sequence alignment problem.
@_surreal99
3 ай бұрын
I'd like to know why the suggestion section will have the word I meant to type but the auto correct picks the wrong one to use. So frustrating!
@arenmee540
3 ай бұрын
now this is awesome
@GothGuy885
3 ай бұрын
something that I have always found interesting and Amazing , and have wondered about, is How windows defrag works. how it goes through everything, sorts, puts things aside that are in the wrong place, deletes data that is no longer needed, and then reassembles everything in the correct order. I have a hard time getting my head around how it does that! 😵‍💫
@a_nerd_on_the_internet
3 ай бұрын
I love the kingsman reference
@derstreber2
3 ай бұрын
11:29 In your wagner_fischer implementation, why are you incrementing change? (line 17) If "previous_row[j-1]" was guaranteed to always be the smallest value, and none others shared that value, maybe it would work. Why not choose the minimum first and then add 1 to it after checking if the two letters are not the same? Or am I misunderstanding something?
@vectasus
10 күн бұрын
I presume this wagner-fisher algorithm is also what is behind the edit distance (file diffing) in git
@Grassmpl
3 ай бұрын
In the matrix algorithm, why do we take min of the 3 entries when the two letters match? Shouldnt it be min(diag, left+1,up+1)? Insertion, deletion still count as one operation, only substitution is not needed.
@invinciblemode
3 ай бұрын
There’s no +1, because there’s no operation needed. No insertion or deletion to get from one letter to the same letter.
@nytrocide007
3 ай бұрын
9:30 why so? is this equivalent to the square bracket in the levenshtein formula? if yes, which box stands for which formula in the square bracket? or perhaps this is left as an exercise for the reader lmao. im a bit lazy ill look over it one more time😅
@idocoding2003
3 ай бұрын
Nice video 👍👍❤
@djangoworldwide7925
3 ай бұрын
Wow I really want to write the levenstein algo!
@Yadakiii
3 ай бұрын
Banger vid
@chloedelaware2922
3 ай бұрын
this video is causing flashbacks to the time I wrote autocorrect for bash
@mishadanilenko955
3 ай бұрын
How do you do this animations? Are you using some kind of library like Manim?
@TheTrainWatch
3 ай бұрын
Do modern spell checkers take into account likely errors due to typing. Ie onky is probably only, it’s not only one edit distance away, but that edit is only one key away too.
@mannyc6649
3 ай бұрын
Could you weigh the edit distance to favor letter substitutions that are physically close in the keyboard?
@mews75
3 ай бұрын
Awesome video
@honestarsenalfan7038
3 ай бұрын
my head hurts, crazy sudoku
@IStMl
3 ай бұрын
Reminds me of my 1st semester in uni, but ZIEGE and TIGER instead of FLOAT and BOAT
@shinobi5189
3 ай бұрын
this is some good content
@kyngcytro
3 ай бұрын
Could add a cache layer so we never have to check a misplaced word more than one. That counts for an easy improvement.
@rocketmanhowie6623
3 ай бұрын
can we have a keyboard/setup tour
@sandiguha
3 ай бұрын
what a great video. food for curiosity
@crazyguy3337
20 күн бұрын
whats the font? looks good overall nice video
@thinzin101
3 ай бұрын
really cool video
@pinoykun3325
3 ай бұрын
If you have enough data, may create a map of all wrongly type words as key then the values would be an array?
@complainer406
3 ай бұрын
I'm surprised a swap of letters within the word wouldn't count as 1 edit, especially since thats such a commone typo, but i guess the complexity greatly increases when you need to consider adjacent letters. Seems wild that "wrold" is equally far from "world" as "woud' or "corold" is
@ondrejkarbas7287
24 күн бұрын
That is precisely the motivation behind Damerau-Levenshtein distance, which considers 4 different operations: insertions, deletions, substitutions and transpositions (swapping two adjacent letters). There is also an O(mn) algorithm for finding that distance, idk how it works tho
@ondrejkarbas7287
24 күн бұрын
And Damerau-Levenshtein can further be improved by also considering a "generalised transposition" as an edit distance of 1. Generalised transposition means swapping two adjacent characters *and* substituting one of them. Afaik there is also an O(mn) algorithm for determining that, but i haven't read too much into it
@wadecodez
3 ай бұрын
For optimization you should store the dictionary in a more efficient manner. Like group similar words. That way when you find the most similar group of words instead of looping through all the words. This is called indexing
@timurrte5694
3 ай бұрын
I got A in my recent Dynamical programming course, but still can't comprehend it 😢
@RealUniquee
3 ай бұрын
Quite a interesting history.
@PavanKumar-if7zi
3 ай бұрын
Awesome 😍
@rembautimes8808
3 ай бұрын
Great video joined as a sub .