Conversational AI solutions are often judged based on how “smart” or intelligent they are but such evaluation must consider these 5 key questions.
The adoption of chatbots and conversational AI agents has seen a stark uptick in recent years. A 2019 study conducted by MarketsandMarkets projected the global chatbot market size to grow 29.7 percent annually to reach USD 9,427.9 million by 2024. The Asia-Pacific region was specifically seen to be the most attractive region for investments, suggesting that we could see more organisations adopting chatbots and related technologies here.
Yet, one question is raised often by organisations in the early stages of adoption: “How smart is the chatbot really?.” While popular media and science fiction has shown us talking robots and sentient computers which can answer every query just like a human, the reality is a little more complicated, at least as of today. To answer the question, we need to consider the following questions.
1. Are the Goals of the Chatbot Aligned with Business KPIs?
First, there needs to be a shared understanding about what is being asked of the chatbot and what is important to the business. Is the chatbot considered smart if the responses involve “small talk” as if it were a human engaging in conversation?
Often during testing we see clients expecting the bot to answer general out-of-scope questions like “Who is in the board of directors of our company XYZ?.” The bot may be providing helpful responses most other times but with this question it is unable to because such examples were not part of its intended knowledge base and hence training data. The reason they were not included is because from experience, customers tend to ask questions that helps them solve problems or get something done as compared to general “Who is” or “What is” type questions.
Some departments on the other hand are content when the proportion of correct responses are above a certain percentage. Or when a certain number of issues are resolved or leads generated. To gauge the smartness of the conversational agent, the entire organisation must align on the KPIs and what they expect the bot to do.
To gauge the ‘smartness’ of the conversational agent, the entire organisation has to align on the KPIs and what they expect the bot to do.
Consider the use case of a conversational AI agent deployed for a hospital or healthcare institution to disseminate health and wellness content to customers and patients. It may be considered smart if it provides useful information via its responses 80% of the time. But if the hospital is more interested in reducing the workload of its operations and administrative team by automating appointment scheduling and actualisation, then the benchmark for smartness may be different.
Perhaps, the term smartness should be replaced with effectiveness of fulfilling the required goals.
2. Does the Bot Understand Intent and Context?
Conversational AI in its earliest form consisted of simple bots that pushed out notifications. In their next stage of evolution, they were able to answer frequently asked questions based on pre-populated rules. Here is a standard scenario a healthcare context:
User question: “I need to book an appointment for a health screening”
Chatbot response: “You may proceed to www.xyz.com/appointments to schedule an appointment.”
Today, it is about understanding context and intents in a conversation. Below is a conversation that is feasible and can be designed to remember attributes of the conversation.
User question: “I need to book an appointment for a health screening next week”
Chatbot response: “Sure, would you prefer a basic or executive package?”
User response: “executive”
Chatbot response: “Thanks! Please hold on while we look for available slots for you.
An intent refers to the meaning or intention behind users’ questions, irrespective of how the questions are phrased, spelled or the language in which they are posed. Today’s conversational AI solutions are capable of recognising that all these queries:
All variations mean the same thing:
“I want to book an appointment for a health screening
3. How Accurate are the Responses (for Questions in Scope)?
This may be the most important criteria for many organisations. How appropriately accurate are the responses to questions posed to the bot? The more it gets right, the better the accuracy.
Accuracy however needs to be looked at in the context of the bot’s scope coverage, or the breadth of topics it has been trained for. If the scope decided at the start is not wide enough, the bot may not be able to understand some queries asked of it and will not be able to respond accurately. This is a frequent problem which leads users to question the smartness of the bot.
Accuracy of a bot needs to be looked at in the context of its scope coverage, or the breadth of topics it has been trained for.
A measure of the accuracy is taken in the testing phase of the process of building an AI chatbot, during which it is challenged with queries taken from real world examples but outside of its training sample. Alternatively, a human evaluator could go through the chat logs to randomly mark the accuracy of the bot’s responses.
If the questions are out of scope, they are generally put aside during the evaluation process, as long as these constitute a reasonably low proportion of the total questions. For example, if only one out of 10 questions are out of scope, it means that the builders of the bot have a good understanding of the range of topics that are helpful to users. But if say, 50% of questions are out of scope, then perhaps there is a need to widen the scope of the training for the bot, to include more knowledge areas.
4. How Useful are the Responses?
It is also important to assess whether the bots are supplying answers that are helpful or useful to the customer. Responses can be broadly categorised into two types – definitive and deflective.
Definitive answers are responses on key topics that rarely changes, like office opening hours and contact details. Deflective responses can be used to guide the user to more info on dynamic content such as promotions, discounts and campaigns.
As a rule, it is wise to avoid simple “Yes” or “No” responses. Instead the chatbot should repeat the question in the answer to give the user context for the answer. This also avoids cases where there could be potential misrepresentation of the response if it is too simplistic.
If the user asks if they can apply for a credit card, the bot should not just say “Yes” or “No”. It can direct the user to the steps, but whether the application will be approved, depends on more factors.
User question: Can I reschedule my appointment online?
Chatbot response: Yes you can!
User question: Can I reschedule my appointment online?
Chatbot response: Yes you may reschedule your appointment by going to our website and look for “Appointments” → Reschedule to enter your preferred new date.
We will send you a confirmation email with the updated appointment details in 10-15 minutes.
5. How Robust is the Data Used to Train the Bot?
Last but not the least, the “smartness” of the conversational AI depends heavily on the data set used for its training. Any AI system is only as good as the data we put into them. To get the best out of the bot, training data must be a good enough representation of how real users ask in everyday conversations.
To get the best out of the bot, training data must be a good enough representation of how real users ask in everyday conversations.
We often see that the best examples of user queries we can use for training come from the customer-facing functions within an organisation. These are people who directly interact with customers and have a good idea of how they ask questions.
Contrast this to some of the more business-facing teams who tend to provide us with plenty of “What is?”-type definition examples. They think this is how customers may ask but such examples may not represent how the queries sound in real life. In reality, especially with transactional queries in customer support, people do not care about definitions – they want to get things done.
In reality, people do not care about definitions – they want to get things done.
Moreover, questions with the same intention can be expressed by different people in different ways. They could be in different languages, worded differently, have multiple sentence structures, short forms, and even grammatical and spelling errors.
This is why it is of utmost importance to collect good quality examples of intents and variations at the start of a chatbot installation project. Compiling all these examples and variations helps the bot learn to answer them all in the same way.
Aim for at least 10 to 20 good examples for each intent per language. Ensure that these examples are real queries that users have asked before, to ensure that they are realistic and natural and not manufactured or restructured to sound formal.
Among the two examples below, which one do you think is more useful in training the bot?
- A: I wish to request to change my appointment for health screening from next Wednesday, 10 am to Thursday, 4 pm. Can you let me know if this is doable?
- B: Want to reschedule my health screening to this Thursday 4pm
The first example is too formal and not reflective of how a real user would ask while the second one is more natural.
Keeping all these questions in mind will help you focus on what you are specifically looking for when exploring a conversational AI solution. Moreover, having a clear idea of what to expect from a “smart” chatbot will help you define clear KPIs to measure the success of the solution.
For more insights on what makes great conversational AI, contact us for a demo today.