At the end of each project we always like to debrief and identify what we learned so that we can share and leverage that knowledge in future projects. Fortunately, the nascent nature of developing for the amazon alexa ecosystem has provided us with plenty of opportunity to learn! This post is how we leveraged custom slots in some new powerful ways in order to deliver our most recent project for Sony Crackle.
For those unfamiliar with developing for Alexa, Amazon has created a very clever interaction model to aid in the development of custom skills. (If you are familiar with developing for Alexa skip to the unlocking custom slots section) The schema consist of three major ingredients:
Intent Schema
The intent schema is where a developer exposes the various intents, public functions in your skill, as well as the slots that intent may use. Below is an example:
{
"intents": [
{
"slots": [
{
"name": "artist",
"type": "ARTIST"
}
],
"intent": "artistAnswer"
},
}...
]
}
Custom Slots
The second ingredient is custom slots. We will dive into these more shortly, but this allows a developer to get access to the part of the sentence they care about. For example if your skill asked what artist sang the song For once in my life
and they replied, oh, I know, the artist is Stevie Wonder
. The only part of that answer a developer cares about is Stevie Wonder
. Custom slots allow us to abstract just that information and pass it to our skill. More on this soon.
Utterances
The last main ingredient is utterances. Utterances is where a developer maps all of the various sentences a user can say and maps those to the correct intents. Using our previous example the utterances would be artistAnswer the artist is {artist}
, where {artist}
is our custom slot. As you can imagine the list of utterances can get very large. We typically generate the various permutations through code.
Getting more out of custom slots
The first thing to know about custom slots are that they are not enums. They simply exist to help train Alexa to look for these answers first. This has a few profound implications:
-
As a developer you should always validate these inputs. For example if you are looking for the custom slot of "Third Eye Blind" Alexa will actually pass you "3rd eye blind". Using something similar to the "Levenshtein distance" to compare the string and finding the right tolerance is a technique we have had great success with in our skills.
-
It is possible to create a slot that acts as a catch all. The way we did this was to create a slot with the top 10,000 questions asked to google. (Amazon has a limit of 50k) Then we had an utterance of
questionIntent {catchAllSlot}
.
For the Titanium Rex skill we wanted to allow the user to ask "Rex" anything and get a yes or no response. The challenge we hit was that we also had other intents we needed to surface, as well, handle specific yes/no questions. For example if a user says, "Are you Heisenberg?" we wanted to have a specific answer. We sprinkled these easter eggs in through out the skill to help the Sony team market the skill. Normally, we would simply have an utterance of: heisenbergIntent are you heisenberg
(learn about Heisenberg). However, if you have too many of these specific intents, your catch all becomes less and less powerful and you start having unexpected behavior.
In the end what we found worked the best was to have only few intents map to specific flows and handle our easter eggs in code. Thus in our code we were able to do things like, check to see if the values in the catchAllSlot
matched to one our easter eggs, and if so then return that easter egg.
This was definitely a fun skill and helped us to realize that while you can have your skill essentially respond to wildcard questions, you are really fighting the model in doing so. We now always consider this in our VUX phase.