Some of you might have seen this Ministry of Health Ask Jamie bot go viral in Singapore for dispensing safe sex vs safe distancing advice. Since then, I have been bombarded by friends, colleagues, partners alike on why this happened, and if we built it.
No, KeyReply did not build this bot.
This is managed by Govtech, and another company. Note: I do not know what their splits in roles and responsibilities are and cannot comment on who should have done what.
While this incident seems embarrassing, this issue is not unique to Ask Jamie.
I’ve seen many nasty and illogical posts criticizing the bot, and definitely do not see any value add in jumping onto the bandwagon. In fact, based on what I know about the team in Govtech, they are competent and care about solving the right problems.
And if you are wondering, no – I do not think that they operate based on just keywords, like the word “positive”. That method is likely too rigid and outdated.
What I’ll like to do is to share some thoughts on what are the possible causes, so that we can all have a better understanding of the situation.
Key issue: People found the answer inappropriate
Was the content wrong or did people not like the way it was phrased?
In this case, the answer was wrong.
First, let us understand the fundamentals.
Many conversation AI systems are intent-based. It is likely that Ask Jamie is the same. An intent is basically what the person wants to do or know.
The data is managed based on question-and-answer pairs, with training data being different ways people phrase the questions but mean the same thing.
E.g. “Can I cancel my subscription” and “I don’t want to continue with this plan” uses different words but have the same meaning.
In the case of MOH’s Ask Jamie, the two phrases that gave rise to different answers are;
“My daughter tested positive for covid 19. What can I do?” -> Correct answer
“My daughter is tested covid 19 positive what should I do?” -> Wrong answer
Possibility 1 from the question perspective: The training examples were most likely conflicted across these questions, which means that the Covid intent likely had training examples that are very similar to those in the safe sex intent.
This results in the model having predictions that may have similar confidence levels between the two intents, and any slight change in the phrasing may surface either of the answers.
Possibility 2 from the answer perspective (less likely in my opinion): When we examine the answers, the second response does not seem to make as much sense while being fluent.
There are generative language models that produce a passage one word at a time when given a certain set of background text. This is what an auto-completion model does. When it works, it feels like magic. When it doesn’t, it just seems like perfectly crafted answers that are irrelevant.
The expectation: You should practice safe distancing… However, while generating the word after “safe” You should practice safe sex … The language model thinks the next word has a higher probability to be “sex”.
And once you have a single word going in the wrong direction, it will just keep going, and produce a natural-sounding sentence, maximizing the probability of each subsequent word.
While this approach is obviously convenient, for high-stakes environments when the organization cannot afford to be wrong, like the government, it may be risky to use this model to serve the public.
To have a rigorous review process and/or tool for detecting and handling conflicts. This is not always the easiest process as there are many nuances in languages and it takes effort to do that well, especially as the knowledge base grows in size
Team members, especially non-technical users who are maintaining the bot should ideally have access to a low/no-code platform to help them do the work or be trained.
Build a robust test set to test the model with many scenarios.
For answers with sensitive content, be careful to manage how these answers will be shown. A direct question and answer response is more risky than it being shown at the end of a workflow or after a qualification step.
For example, if we put in an explicit confirmation before showing the answers
“Are you asking about Covid 19 or family planning queries?”
Then even if they click on the family planning query option, they will receive the family planning answer, which is then not funny given the prior selection.
In summary, #nlp natural language #AI is still progressing with more potential to be realized as compared to other branches in AI, as there are many nuances and negations, among other factors that can cause issues.
When done right, and having the head start in implementing such a system has lots of tangible benefits like improving customer service levels, consolidation of knowledge bases, scalable efficiency gains, and building a nimble workforce that can ensure that the organization is future-ready.