'Amazon's Alexa Has 80k Apps and No Runaway Hit'

I read the headline of this article, and immediately had some thoughts:

I saw this coming a mile away, and I don’t know that there will ever be a “killer app” for these devices.

  1. Discovery: Installing the apps is relatively inconveneint / out of the way
  2. Usability: Using them is awkward (“Alexa, tell <app> <thing>”)
  3. Recall: An audio-based interface gives no opportunity to prompt you that an app you’ve installed even exists
  4. Crapflooding: A gold rush can drown out the gems.

The discovery issue is interesting, but I don’t really have a good sense of whether Amazon can train their users in such a way that people get used to launching the app on their phone and looking for Alexa apps the way they look for apps for their phone. Until this article, it hadn’t occurred to me to even think of doing so since I first got the device. That’s a problem for turning these things into a proper platform. Maybe it’s solveable, maybe not – I’m not qualified to make a prediction. It’s certainly a drag coefficient that needs to be addressed.

The usability issue is probably the largest reason I haven’t engaged with the ecosystem more. The cumbersome need to namespace commands in a very rote and specific way makes it seem more worthwhile to just do whatever it is I need on a computer or phone. That may be addressable by making the namespacing less proscriptive or customizable or some such – but again, I don’t see a clear path forward on that front.

The real issue, in my mind, is recall. The linearity of audio means you can’t “skim” an audio interface, so there’s a strong pressure to not introduce any on-device / in-context affordance that enumerates what options are available to you. You’re limited to what you can remember. That means you’re limited to what you can habituate.

Again, before I’ve even looked at the article:

I would bet that the vast majority of Alexa users aren’t aware of even 10% of what Alexa can do out of the box, and use maybe 3-4 features of it regularly.

There will, of course, be some percentage of people who RTFM and/or read those weekly announcement emails describing all the new features and maybe pick up a few things from it here and there that they can habituate them.

Certain novelty items may see semi-widespread awareness despite a lack of routine use (“Alexa, tell me a joke”), precisely because their novelty stands out enough to make them memorable.

But I would guess that the median Alexa/Google Home/etc user is like me: They know a handful of things it can do. Of those, a subset hit a sweet spot for convenience, utility, and reliability resulting in routine use.

For me, that subset is: Timers, alarms, current time, current weather.

And then there’s the last item: Crapflooding. As Atari can tell you, a sea of garbage can drown an established platform, even if it’s already had established hits propelling it to consumer success. What, then, does a flood of garbage bode when a platform hasn’t even been established as viable yet?

Ok. So. Now, I’ve read the article. How does it match up to what I expected?

“Surveys show most people use their smart speakers to listen to tunes or make relatively simple requests—“Alexa, set a timer for 30 minutes”—while more complicated tasks prompt them to give up and reach for their smartphone.”


“But it poses problems for developers, who encounter a steep learning curve in building voice apps.”

Ok, that’s interesting. I hadn’t expected that. But it makes sense: Developers of IVR systems have already acclimated to this, of course. Tech entrepreneurs, however, probably have a pretty small overlap with developers of stodgy IVR systems.

“Even after creating an app, there’s no guarantee people will find it.”


“While smartphone users can quickly eyeball a list of available apps on a screen, multiple options get lost easily on a voice-based service.”

Recall, again.

“Fully half of smart speaker users says they don’t seek out applications, according to a survey by voice software news site Voicebot and Voicify, which makes developer tools.”


“"There are kind of a cluster of features people are coming to expect for voice: a daily news summary, weather, timers and a random fact,” said James Moar, an analyst at Juniper Research who tracks voice software. Beyond that? “People aren’t really experimenting that much.”“

Consumers intrinsically understand that usability and recall are fundamental problems here.

"There’s also the essential question of what tasks people would rather complete with their voice than another device using their eyes and fingers.”


“"Look at the internet in 1995, the first websites were not moneymakers,” says Brandon Kaplan, chief executive of Skilled Creative, a marketing startup that has worked on voice software projects with PepsiCo Inc. and CBS Corp.’s Simon & Schuster.“

True. Of course, the EO and Newton weren’t viable platforms, either. Ultimately the problems of early tablet systems were solved (form factor, battery life, performance, convenience of adding apps) or bypassed (touch keyboard instead of handwriting recognition), and they became viable as appliances and as platforms. Awesome for all involved – except the first few generations of players in the space.

Not every potential platform takes off, however. Friction eats away at the value proposition of a thing. Imagine, for a moment, if cellular Internet access weren’t a thing. Would every teenager consider their PDA indispensable?

There are fundamental differences in how we mentally work with and relate to audio stimuli vs. visual stimuli. I don’t think it’s unreasonable to ask the question of whether the linearity of how we process audio renders it unsuitable as a platform. Star Trek: The Next Generation had us excited for the idea of a computer we could order around, but it never stopped to ask some fairly obvious questions. Questions like: "How do people learn what the computer can, and can’t do for them?”

So where are is this all headed?

For the first time, I ask myself: “How much value have I gotten from my Alexa?” And the short answer? Not enough to justify the cost. Not enough to persuade me to put one in every room. Not enough to persuade me to buy the next generation. The ecosystem might hold wonders I can’t imagine – but the idea of a voice-based UI for anything non-trivial leaves me not caring enough to look. And at the end of the day, that leaves me feeling like these voice assistants will make lovely little appliances – but nothing more than that.

1102 words, est. time: 220 seconds.

« Pitfalls of a Monorepo


Copyright © 2022 - Jon Frisby - Powered by Octopress