Enhancing Intent Classification with the Universal Sentence Encoder

August 02, 2018 • MachineLearning

TLDR: We built a custom Rasa Featurizer which appends a Universal Sentence Encoding, improving intent classification performance by overcoming inherent limitations of bag-of-words models. See this Gist for the code.

When building chatbots, the first step is usually to build an intent classifier in order to put arbitrary user messages (“Can you reserve me a restaurant table?”) to a pre-defined intent (request_restaurant_reservation). These are usually bag-of-words (BoW) models because they require little training data and are a surprisingly effective baseline.

Bag-of-words models only get us so far

Bag-of-words models essentially sum up word representations for the words in the message in order to compute a message representation. An inherent limitation of this approach is that the model has no clue about the original word order.

For example, consider the following messages:

  • I want Pizza, not Sushi.
  • I want Sushi, not Pizza.

Because both messages contain exactly the same words, their bag-of-words representation would be the same and the classifier would have no chance to distinguish the two.

So, is it our only option to go for a more complex classifier and feed it thousands of training examples?

Universal Sentence Encoders to the Rescue

No. A far simpler solution would be to use a pre-trained Universal Sentence Encoder (USE) to map the input sentence to a vector representation that captures the meaning of the sentence. Given such a vector, we can feed it as input to the intent classifier, in addition to the bag-of-words representation.

For example, Google has released a pre-trained Universal Sentence Encoder as a TensorFlow Hub Module, ready for us to use. Check out this graphic from the paper, illustrating how the similarities of the resulting sentence vectors correlate with the semantic meaning of the sentence pair:

Correlation between semantic similarity and the similarity of the Universal Sentence Encoding

Adding a Sentence Encoding Featurizer

Using Rasa NLU for intent classification, we start out with an NLU pipeline like this:

- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"

The first pipeline step builds the BoW count vector (containing the number of occurrences of each word in the vocabulary). In Rasa language, this is called a featurizer. The second step invokes Rasa’s embedding classifier. It multiplies the count vector with a learned matrix, mathematically equivalent to learning word vectors from scratch and adding them up. See this blog post for a description of the embedding classifier.

Now, we would like to add a second featurizer that appends the sentence embedding to the count vector. Internally, the embedding classifier will multiply the resulting feature vector with a learned matrix:

Embedding Classifier Computation

As a result, the vector representing the message consists of two parts:

  1. The Bag-of-words representation, using word vectors learned from scratch.
  2. The Universal Sentence Encoding, projected into the same vector space using a learned projection matrix.

Both parts are effectively computed independently and summed up to yield the combined message representation. In comparison to using just the count vector featurizer, we only need to learn an additional projection matrix (d x dUSE parameters).

Implementing a Custom Featurizer

We can add custom components to the Rasa NLU configuration by referring to them via their package name:

- name: "intent_featurizer_count_vectors"
- name: "my_package.UniversalSentenceEncoderFeaturizer"
- name: "intent_classifier_tensorflow_embedding"

All that’s left to do is to implement a Featurizer that invokes the Universal Sentence Encoder and appends its output to the message’s text_features. This is done with only about 50 lines of codes, using the existing TensorFlow Hub module. Check out this Gist for a complete implementation.

Results and considerations

So far, we tried this approach in one project. Without any hyperparameter tuning, we saw a solid 3 percentage point improvement in F1 score. Even using the sentence encoding as the only input features maintained the same performance as just using the count vector featurizer (while dramatically reducing the number of model parameters).

Because this project was a chatbot in German and the Universal Sentence Encoder is only available in English, we added another custom RASA NLU component that automatically translates the input message to English.

We believe that adding pre-trained sentence encoders as featurizers to the NLU pipeline is a great way to enhance the capability of the intent classifier while keeping the data requirements low. Let us know if you try this approach yourself in your project and tell us how it went!

by Georg Wiese

Related posts