If you own a Fire or Echo device, you’ve likely already met Alexa. She’s the voice assistant that helps you turn on Vanilla Ice without lifting a finger, order an Uber without the need to pull out your phone, and many other skills as well. From a development perspective, Alexa sits in front of one of the most pleasant Voice development platforms I’ve ever worked with.
It’s worth mentioning that Alexa is not the only AI in this space. Apple has Siri, Microsoft has Cortana, Google has the…Google Assistant, and even Samsung has one called Bixby. Some of the options out there are completely locked down, used only by the manufacturer as a value add for their offering, while others allow developer contributions for apps but are very restricted on what hardware they can run on. Alexa falls to the latter classification. While Alexa the Voice Assistant can be used on other platforms in a restricted manner (like being relegated to the Amazon Store App on an iPhone), an organization cannot currently develop custom applications or Skills for Alexa for use on those platforms. There may be hope for other platforms if recent discussions are to be interpreted positively.
And Now For The Skills
In Alexa, an app is called a “skill”, and the trend in the industry is pushing more heavily towards adding Alexa as an interface for an existing system. Nike, for example, has their Nike+ service which collects metrics about the various exercise activities you perform. They already have an existing web API / website which provides this information. Alexa can provide another interface to that data allowing users to say something like, “Alexa, ask Nike how much I’ve run this week”. The Alexa Skill would then reach out to the existing Nike+ web API, collect the data it needs, and then return the same data the website would return, but as a spoken phrase from Alexa instead of text on a screen. From a UX perspective, you’ve just negated the need for about three different steps to get to the same data (login, navigate, filter data, and then finally consume).
But How Do Skills Work?
A Skill is an application that mediates between the Alexa Voice platform and your service. To make this Skill tick, you have a choice of either using AWS Lambda or your own hosted web service. AWS Lambda has a unique offering in that a lot of the complexity of a hosted Alexa-compliant web service like HTTPS, Authentication if required, infrastructure hosting, and etcetera are all managed for you with Lambda. It even has a couple of starter templates that help you get going with Alexa very quickly. The downside is that you are limited to the languages you can develop in with Lambda. If you are a JVM, Python, or NodeJS shop, then this shouldn’t be an issue, but if you’re like me and really itching to use Go, then a self-hosted web service is an option.
Speaking of authentication, Alexa provides account linking, as they call it, through OAuth 2.0. This is the approach used to identify an Alexa user in your system, so if your backend system can speak OAuth 2, then Authentication becomes a breeze for Alexa Skills. The OAuth token will then be automatically included in all requests coming from Alexa for the current session. It must be mentioned here that the user has to initiate this link when they enable your Alexa Skill and at any point that they would have to re-authenticate like you do when you log in to a website. The user must use the Alexa Mobile App to authorize this link, as giving voice authorization is not currently supported. It’s not an automatic process, but to the user, they’re simply logging in (in a way) so it’s not that bad in my opinion.
So, what’s the overall experience developing for Alexa? In my opinion, it’s fantastic. The flexibility of being able to host your own Alexa Skill is very enticing for those who don’t want to be tied into AWS infrastructure, but if you’re like me and are already neck-deep in AWS, then Lambda plus the various monitoring tools provided makes it so simple to support these skills and get Alexa added to your application faster.