Building a Task Oriented Chatbot

6 min readApr 27, 2021

Have you called your bank and talked to a machine before talking to a human agent? I did! And I got to admit that sometimes I was surprised by their human like answers? I mean how they even understand us? How they know what to answer? How can we build one of them? In this post, I would like to share my experience in building these types of chat bots, which are called Task Oriented Bots. There are also other types of chatbots such as conversational ones that are not the subject of this article.

Data

As of all machine learning and deep learning projects we need to start with a suitable dataset. Since chatbots are based on conversation, we need a dataset that contains conversation between two persons and are about requesting a service, such as reserving hotel, taxi, or just asking for an address. There are many datasets available for this purpose. I found a Github repository which has gathered all these datasets into one zip file and you can download all of them at once. Here, I will introduce one of these datasets that we will use in this project.

MultiWoz

MultiWoz is a 10000 fully-labeled collection of human-human written conversations spanning over multiple domains and topics. The dataset is available in this GitHub Repository. This dataset has been updated multiple times and all versions are available in their GitHub repository. In this project, we will be using 2.1 version. This version contains 10000 annotated dialogues in json format. The following shows a sample conversation in this dataset:

>> I am looking for a place to to stay.
>> Okay, do you have a specific area you want to stay in?

If you are interested in this dataset, you can also read their paper here.

Methods and Libraries

I have found two libraries to build a task oriented chatbots. One of them is ToD-Bert library, which is built on HuggingFace library and PyTorch. The other one is using simple transformer with Google Trax library. The latter is the approach that Andrew Ng teaches in the NLP specialization on Coursera. I will explain these two approaches in the next section.

ToD-Bert

ToD-Bert is a pre-trained language model based on BERT that is provided for building Task Oriented Chatbots. As of other pre-trained models in HuggingFace you can load the model using the following code:

import torch
from transformers import *
tokenizer = AutoTokenizer.from_pretrained("TODBERT/TOD-BERT-JNT-V1")
tod_bert = AutoModel.from_pretrained("TODBERT/TOD-BERT-JNT-V1")

Then you need to build a pipeline and predict the next piece of conversation.

😞 To this moment, I was not able to use this model to build the bot, but I will be working on it and will update this section soon.

Trax

Trax is a deep learning library introduced by Google Brain Team very recently. It is super fast and it can process on Tensor Processing Unit (TPU) in addition to GPU and CPU. It is also very easy and efficient library. If you want to access a TPU on Google Colab, this tutorial helps you.

Installing Trax

You can use the following to install trax:

import os
import numpy as np

!pip install -q -U trax
import trax

We do the following steps to create the chatbot:

Preprocessing: We need to convert data to the form of a conversation as follows:

Person 1: Hello, I have been robbed.  Can you please help me get in touch with the police? 
Person 2: Parkside Police Station is in Parkside, Cambridge. Their number is 01223358966. Anything else I can do for you?

Create the model: We use ReformerLM class to create the model with parameters that we want, such as vocab size, number of layers, and the mode, which is train mode for this step.

Initialize pre-trained weights: We need to initialize weights from a file with pre-trained weights with model.init_from_file. Trax is publishing various pre-trained models for various tasks. For example, the following is a pre-trained model for machine translation:

# Initialize using pre-trained weights.
model.init_from_file('gs://trax-ml/models/translation/ende_wmt32k.pkl.gz',
                     weights_only=True)

Tokenize input: We need to tokenize the input sentence to input into the model with trax.data.tokenize

Train and do the prediction: We use trax.supervised.decoding.autoregressive_sample to predict the next word in the conversation.

De-tokenize the output: We need to decode results to get the reply with trax.data.detokenize

Evaluation

As a very simple baseline, I created a retrieval based chat bot. To do that, I extracted and stored all messages with their unique IDs in a csv file. And used bm25 metrics to find the most similar answer to the input. My retrieval chatbot can answer a question, but clearly is not intelligent, for example,the following is the a sample conversation between I and the chatbot:

me: I need to book a hotel, please?
bot: Hi, can you help me find a hotel, please?
me: I am talking to you
bot: It was nice taking to you! Thanks you for calling! Good Bye!
me: You are not smart at all! Consider using AI!?
bot: Absolutely positively no problem at all! Thanks for using our service. Enjoy your time!

Using the mentioned reformer model we could get the follwoing:

Person 1: Hi, I need a hotel, please.
Person 2: I can help you with that. What area oftown are you looking for?Person 1: I’m not concerned with the area but Iwould like it to be in the north.
Person 2: I have 2 hotels in the north. One is theAshley Hotel and the Lovell Lodge. Which wouldyou prefer?
Person 1: I would prefer the Ashley hotel.

The task success can be measured by evaluating the correctness of the total solution. We use the slot error rate to measure the success of the bot in comparison to the baseline. We calculate this metric by measuring the error in identifying various slots in the conversation. For this task I compute average slot error ratefor 3 sample coversations:

coversation 1:

Person 1: Are there theatres in town?

Person 2: : There are 4 theatres in town. Twoin the centre of town and one in the south. Doyou have a preference?c

oversation 2:

Could I get the postcode and phone number?

Person 2: The postcode is cb17dy and thephone number is 01223902168.

coversation 3:

Person 1: No, I just need to know the contactnumber and car type.

Person 2: I have booked you a taxi for a whiteford. The contact number is 071756.

we can see that the baseline gets 0.75 error rate on average for three conversations. This is expected because the retrievalmodel is just repeating the question. However, the reformer model gets very low error rate.

Conclusion

In this project, we used MultiVoz dataset, which is a 10000 fully-labeled collection of human-humanwritten conversations spanning over multiple do-mains and topics, to build a task oriented chat bot. For the baseline, I created a retrieval based chatbot, which was based on bm25 metrics to find the most similar answer to the input. As expected, this bot was answering to the prompt, but the answers were not related to the provided task. For the main model, we used reformer model using google Trax library, which outperformed the baseline.

What is Next

As I mentioned in the method section, I would like to learn more about HuggingFace library and pre-trained models such as ToD-Bert. I would like to come back and update this article by completing Tod-Bert section.

I would also like to design a very simple UI with flask for the chatbot.

Learning Resources

The reformer approach mentioned in this article is based on the final project of ”Natural Lan-guage Processing with Attiontion Models” courseon Coursera Platform.