Building custom skill for Alexa multi-turn dialog management
We have witnessed rapid developments around voice assistants over the past few years. With mobile users increasing exponentially every passing day it would be fair to assume that voice searches will rise simultaneously. Fiction has transformed into reality, one can pose questions to a device and get human-like reactions, stunning isn't it? This is what millions of users are doing every day with Alexa, Apple pod, Google assistant, etc. User interfaces have changed over time, and with each new user interface, a bundle of new difficulties has emerged.
Conventional user interfaces are displayed as controls in an application (text boxes, buttons) or web pages. They are vigorously utilized and have been demonstrated to be sufficiently effective for human-machine interaction.
| The question persists, why build voice assistants? What are the advantages of having voice assistants?
- The magic of conversational interfaces is that users don’t have to learn how to use them. Alexa skill (android app) should leverage the power of natural language understanding to adapt to the user’s word choices, instead of forcing them to memorize a set of commands.
- As someone said, “Don’t play the odds, play the man”. The voice assistant will be able to do that as voice search keywords are normally longer than text search which is why they are more conversational.
- One of the significant benefits of voice assistants is their machine learning capabilities. The more we interact with these devices, the more the assistants grasp. After a period, they can return highly customized outcomes.
- With voice assistants, you can take into account the customer based on who they are and not simply their behavior. While it's still early for personalization of the customer experience through voice assistants, this is tremendous for businesses.
- Conversations are classified into two types, single-turn, and multi-turn dialogs.
| Single-turn Vs Multi-turn Dialog with Drupal
Single turn: Dialogs where the conversation ends with one question and one response in return. For example: Asking Alexa to set an alarm, a reminder, play a song, adjust the volume, is not a technical conversation. This is called a single-turn conversation.
Let’s consider an example in context with the Drupal and Alexa module. Here we have created Alexa skill which provides information related to Drupal. The user asks Alexa ‘who is the founder of Drupal?’ she responds ‘Dries’. But when you ask her “Which year it open-sourced?”. Alexa fails to determine the context of the question i.e “Drupal” and treats it as a brand new query.
A few questions cannot be answered in a single turn. A user may pose a question that should be filtered or refined to determine the correct answer. That is where Multi-turn conversations come in to picture.
| Dialog Management
Genuine conversations are dynamic, moving among topics and thoughts smoothly. To make conversational Alexa skills, structure for adaptability and responsiveness. Skills ought to have the capacity to deal with varieties of discussion, contingent gathering of information, and switching context mid-discussion. Dialog management makes these regular communications conceivable. - Definition from Alexa docs
| How do you make this work?
Create an Alexa skill:
- Amazon Alexa skills are equivalent to Android apps. You have to create a custom Alexa skill using the Alexa skill kit (ASK) on the Amazon developer console.
- You define an interaction model and point it to your Drupal site.
- With the Alexa Skills Kit, you can create skills with a custom interaction model.
- You implement the logic for the skill, and also define the voice interface through which users interact with the skill.
- To define the voice interface, you map users' spoken input to the intents your cloud-based service can handle.
Components for Alexa custom skill:
- Use an invocation name to start a conversation with a particular custom skill. For example, if the invocation name is "Drucom", the users can say “Alexa, open Drucom”.
- An invocation name can be called to get things going or you can combine invocation name with intent such as “Alexa, open Drucom, order wine”.
- Each intent corresponds to something that the Alexa skill is capable of doing. Intent can capture the things that your users want to do. You might design intents to capture the details. Each intent in the Alexa skill contains the following:
- Intent name
- Slot (optional)
- Utterances are nothing but an invocation phrase for intents. Users can express the same intent using different statements. For example, if we were building a help intent, there are different ways one can express that he/she requires help:
- I need help
- Help me
- Alexa, can you help me?
- The way Alexa works is, it will parse what the user says. It will not just send the raw sentence but it will pass the intent that’s being triggered too.
- The utterances you provide to an intent do not have to be perfect which covers all the cases and scenarios, it is training data for Alexa to figure out what the intent here is.
Let's start with implementing the interaction model for the Add to cart functionality.
Step 1: Create a new skill with Drucom as the skill name
Step 2: Set an invocation name
Step 3: Create an intent
For our interaction model, we will create an intent called AddToCartIntent, which will be responsible for handling the utterances for adding items to the cart. Adding utterances: When users interact with our skill, they may express additional things that indicate what they want to order.
Looking at the above utterances we can say that the AddTocartIntent will only be invoked when the user tries to add Red Wine to cart but it will not invoke if a user tries to add some other product and that's where custom slot types come to the rescue.
Step 4: Create slot types and using them in AddToCartIntent
- Glancing through the utterances above, we can identify the two slots that we have to catch i.e productName and quantity.
- We have to create one custom slot type for productName and will use one built-in slot type AMAZON.number for quantity.
- Custom slot types are a list of possible values for a slot. They are utilized for a list of things that are not secured by the built-in slot types provided by Amazon.
Once we have set up the slot types, it’s time to apply them in our intent. Once you are done with the changes our intent will look something like this:
Step 5: Activating Dialog management
To activate the dialog, you will have to mark at least one slot as ‘required’.
Slot form - you need to provide the sample prompts which Alexa will use while asking questions to the user, along with these sample utterances the user might also add a slot value. Now our interaction model for AddToCartIntent is ready.
I have covered what single-turn and multi-turn conversations are, and how multi-turn conversations with Alexa and Drupal are vital. I have also described the steps to create a custom Alexa Skill. In my next blog, we will learn more about Configuring a Drupal site.