Building Rasa NLU custom component for spell checking in incoming message

Nikhil Cheke
3 min readOct 8, 2020

--

Overview

Using pre built Rasa pipeline components we can do multiple things in customising the Rasa models. But in some situations we might want to implement features which are not pre built in Rasa NLU. One of such feature is spell checking. When Rasa NLU model receives incoming message from user with spelling mistakes, this might lead to incorrect intent classification and entity extraction. We can think about multiple solutions for spell checking problem as follows,

1. Add all possible spelling mistakes in lookup table
2. Add all possible spelling mistakes in training data
3. Build a custom component for spell checking

First two solutions from above list will take lot of time to prepare both lookup table and training data. So we will go with building custom component for spell checking.

Adding a custom Spell Checker Component to the Rasa NLU

In this spell checker custom component we are using pyspellchecker algorithm. pyspellchecker uses a Levenshtein Distance algorithm to create all permutations from given input word within an edit distance of 2. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. It then select one word which is found more often in the frequency list as the correct result.

Installation

pip install pyspellchecker

Example

Output of the above code will look like this,

Misspelled word: pythen
Corrected word: python

So now we are good to go for custom component code, let’s create spellchecking.py file as a component and CorrectSpelling as a component class,

In above code, in process function we are tokenising the input text from user and doing spell checking word by word on it. Finally building a sentence from these corrected tokens. That’s it! We are ready to pass this corrected sentence to further component mentioned in the pipeline.

Where to put this component in the pipeline?

It completely depends on your problem, but in current issue of spell checking we have to put this component before tokenisation in the pipeline. As a modification if you are passing tokens from tokenisation component directly to spell checker then you can put this component after tokenisation step in the pipeline. You can mention custom component in the following way in nlu_config.yml file,

- name: "spellchecking.CorrectSpelling"

Now you are ready to train the model and test the result against spelling mistakes.

Training and Testing

Before training the model and testing the input text, we have to set pythonpath for this rasa project so that Rasa can find our component,

export PYTHONPATH=/path_to_your_project_dir/:$PYTHONPATH

Now you can go for training and testing the Rasa model,

1. rasa train nlu --config nlu_config.yml
2. rasa shell nlu

Summary

In this post, we have learned how to create custom components for spell checking and how to add them to the Rasa NLU pipeline. There is a huge scope in writing your own component according to the given requirements.

Thanks, happy coding !

--

--

Nikhil Cheke
Nikhil Cheke

Written by Nikhil Cheke

I am a Software Engineer + Researcher. I take pride in building model that translates data points into business insights.

Responses (2)