Recently I bumped into a question on Stackoverflow, how to recover phrases from abbreviations, e.g. turn "wtrbtl" into "water bottle", and "bsktball" into "basketball". The question had an additional complication: lack of comprehensive list of words. That means, we need an algorithm able to invent new likely words.



I was intrigued and started researching, which algorithms and math lie behind modern spell-checkers. It turned out that a good spell-checker can be made with an n-gram language model, a model of word distortions, and a greedy beam search algorithm. The whole construction is called a noisy channel model.



With this knowledge and Python, I wrote a model from scratch. After training on "The Fellowship of the Ring" text, it was able to recognize abbreviations of modern sports terms.



Spell checkers are widely used: from your phone's keyboard to search engines and voice assistants. It's not easy to make a good spell checker, because it has to be really fast and universal (able to correct unseen words) at the same time. That's why there is so much science in spell checkers. This article is aimed to give idea of this science and just to make fun.


I guess you came to this post by searching similar kind of issues in any of the search engine and hope that this resolved your problem. If you find this tips useful, just drop a line below and share the link to others and who knows they might find it useful too.

Stay tuned to my blogtwitter or facebook to read more articles, tutorials, news, tips & tricks on various technology fields. Also Subscribe to our Newsletter with your Email ID to keep you updated on latest posts. We will send newsletter to your registered email address. We will not share your email address to anybody as we respect privacy.


This article is related to

machine learning,fancy abbreviations