The Practical Futurist Podcast — Episode 3: The Future of Voice with James Poulter
In episode 3 we speak with James Poulter about the Future of … Voice.
The Practical Futurist Podcast answers the question “What’s the Future of … ?”
In each episode we will provide a look at what’s new, what’s next and what you need to do next week to survive in this digital world, as told through the eyes of global experts.
You’ll find every episode full of practical ideas and answers to the question, “What’s the future of …?” with voices and opinions that need to be heard.
But beware, I’m no ordinary futurist, and along with my guests we’ll give you things you can use in your business next week, not next year.
You can listen to the podcast above, or via your favourite podcast app. Links to the most popular ones are below or simply search for “Practical Futurist Podcast”. If you can’t find the podcast on your favourite platform, please contact me, and we can get it listed there.
In this show we covered a range of topics including:
- Why James started podcasting back in 2007
- Key learnings from being an early podcaster
- How the next generation will interact with computers using voice
- How smart speakers will drive the growth of all voice services
- Is there a voice “killer app” yet?
- What will Google Duplex do for voice?
- Building empathy into voice services
- The science of voice phrasing for success
- The notion of a “brand tone of voice”
- Predictions for voice in the next 18 months?
- The possibility for dedicated voice departments
- Ambient listening — what is it and what are the implications?
- How voice is growing thanks to “screen fatigue”
- The rise of voice + AR
- Voice in the enterprise
- Privacy vs utility — crossing the “privacy chasm”
- Three things listeners can do next week to experiment with voice
- Offer of a free Amazon Echo to the first listener to contact James @JamesPoulter with the hashtag #practicalfuturistpodcast
00:24 Podcasting since 2007
01:31 Pre-iPhone podcasting — biggest learnings
02:17 Work with an influencer in 25% of all homes
03:12 My first experience with voice assistance
04:07 How children interact with smart speakers
04:20 the next generation will talk to their computers a the primary interface
04:42 Smart speaker are driving the growth of voice services
06:00 Don’t just focus on the smart speaker
07:20 A Voice-only experience is limited — “Voice+” is the future
08:00 A great example of voice is What 3 words
08:50 What is the “killer app” in Voice?
09:52 What will Google Duplex do for voice?
13:00 Should we we saying please and thank you to voice assistants?
14:11 Will we have an emotionally intelligence bot in our lifetime?
14:45 AI at the edge
15:17 When will Voice AI be mature?
17:10 The science of voice phrasing
18:00 The notion of voice brand guidelines
18:41 Designing voice responses based on place and person
20:00 Brand tone of voice — will we hear the voice of the brand?
21:16 Using the voice of the Master Distiller for a Whiskey brand
22:33 The notion of Voice user experience
24:00 Will we have dedicated voice departments?
25:00 The speed of adoption of voice in under 18 months
27:00 Why we’re listening to more audio content than ever before — have reached peak screen
24:40 Audio can be better for our health than swiping screens
28:50 Audio — the screen rebellion 28:07 Voice beyond the home
29:00 Bose AR is a good view of the ambient voice experience future
29:35 The rise of Voice + AR
30:29 Voice in the enterprise — eg meeting room bookings
31:32 Surfacing data using voice “Hey Salesforce ….”
32:08 Ambient listening — the always on assistant
32:26 Is our phone listening to us? The privacy squeeze
34:08 40% of people with a smart speaker worst that they’re always listening to me
34:30 Ambient listening use cases — mental health and ethics
35:20 Privacy vs utility — crossing the privacy cavern 36:42 The Voice 2 community — key learnings voice2.io
39:47 — one listener will get a free Echo Dot!
40:06 Where can you find out more about James @vixen_labs Vixenlabs.co?
More about James
Andrew: 00:00 Welcome to the Practical Futurist podcast, a show all about the near term future with practical tips and tricks from a range of global experts. I’m your host Andrew Grill. You’ll find every episode full of practical ideas and answers to the question what’s the future of with voices and opinions that need to be heard.
James: 00:24 Smart speakers are driving adoption of voice technology across all other platforms.
Andrew: 00:29 That’s the voice of today’s guest. James Poulter, who is the cofounder of voice agency Vixen labs. We had a great conversation about the future of voice, not just smart speakers. We also spoke about “voice first”
James: 00:43 When you start adding voice first as an interface voice plus vision, voice plus UI voice plus kinetic responses like telling your Roomba to start kind of going around the living room and vacuuming something, or turning your coffee machine on. Those are the use cases that really get people excited about something.
Andrew: 00:58 Another topic we discussed was what’s the voice killer app?
James: 01:02 What’s the killer app in voice? I keep saying to people is that the killer app for Alexa isn’t some skill. The killer app for Alexa is Alexa.
Andrew: 01:09 We also examine the question of should we be saying please and thank you to our voice assistants to set a good example for our children
James: 01:16 … so we feel able to somehow treat them badly because we know that they’re not real or they’re somehow different from us or when you can’t tell if it’s a real person at the end of the phone or not, that presents really interesting questions because how will we teach our kids to talk to these things? Should we be saying please and thank you to set a really good example then because the future they will live in will be one where they don’t know and that presents some really interesting kind of challenges for us
Andrew: 01:37 With voice applications on the rise thanks to the rapid adoption of smart speakers in the home, brands are getting deeper into voice and we looked at the notion of voice personas for brands.
James: 01:47 Very, very few brands in the world have a) the brand recognition and b) really the need to create a true full ambient voice and experience , but creating experiences for these platforms that use a brand tone of voice, I think is applicable to nearly everybody tasting scotch whisky at home with this tasting experience. We use the voice of the master distiller from Scotland because the timbre of the voice is different, the tone, the richness. If you care about whiskey, you care about the voice of where it comes from.
Andrew: 02:15 .. and finally, as this is the practical futurist podcast, James left us with three things to do next week, but you’ll have to listen to the end to find out what he suggested. I know you’re going to enjoy this episode and please don’t forget to like and subscribe on Apple Podcasts. It really does help a brand new podcasts like ours become better known and put in front of audiences just like you.
Andrew: 02:38 Welcome to episode three of the practical futurist podcast. Today I’m joined by James Poulter, Co-founder, CEO of Vixen labs. Welcome James.
James: 02:46 Thanks Andrew. Lovely to be with you again.
Andrew: 02:48 It’s very fitting we’re talking about the future of voice on a podcast, and I know this is by no means your first podcast. I understand you were podcasting back in 2007.
James: 02:59 Yeah, that’s right. I was a bit early to it. I think if you ask most people, so I started in podcasting and radio back in the kind of mid-noughties. I actually studied radio broadcasts at the university and then tried to found a podcast production company trying to serve kind of corporate clients from about 2006/2007 I think it was me, The Guardian and Ricky Gervais were about the only people trying to do it properly in the UK. So one of the main shows I was doing back then was for the club that some people will know called CoCo in Camden or what used to be called the Camden Palais for those of us that might have frequented that before that time, and I basically was the only sober person I think in the house between about 10 o’clock at night and two o’clock in the morning every Friday night, interviewing bands backstage, for NME magazine’s club enemy nights, and I was turning that into shows and started doing all sorts of other programs off the back of that, which then led into a career for the past 10 years in social media predominantly and digital marketing, and then now back in voice in podcasting again.
Andrew: 03:58 So 2007, you must be massively early to the podcast party.
James: 04:01 Yeah, that was, I think back in the day that was bearing in mind that was pre-iPhone, and most people could only really access a podcast through an RSS feed on a desktop or the iPod and where you had to download them before you went anywhere. So we put together Vixen labs. So myself and my co founder, Jen Heap came together in the late part of last year, and in earnest really started in January serving corporate clients as well as also building our own experiments and experiences, for voice platforms.
Andrew: 04:29 So do you remember your first voice assistant interaction?
James: 04:32 Yeah, so my first one was I bought a, I bought an echo off of a friend of mine who lived in the US, cause you couldn’t buy them in the UK, and so he shipped it over to me, actually I think he brought it with him the first time he came over, or at least as the first experience I remember of smart speaker, and I remember booting it up and trying to make it work and none of it worked because it thought that I was in the US and not the UK … bit of a failed start. But I think if you go back further than that, talking to smart assistants in the car via Siri, pressing the button on the steering wheel. The first time that you could ask Siri a question and you actually get a real answer from it. That was when I was like, this is the opportunity that’s in front of us is that, you know, if we could have a more natural way of talking to our computing, everyone will want to move towards that.
James: 05:15 And I think maybe it wasn’t my first experience, but I think the experience that I saw over the past few years was particularly when my eldest daughter who’s just turned 4 this week when she was about one and a half and we had one of these things in the house, she began to talk to it and that told me a lot that, you know, if you think of we as a generation, kind of grew up typing and using mice to control our computers. The next generation predominantly have learned how to use computers by tapping on glass that they held in the hands, or on a table. The next generation will talk to their computing as probably the primary interface and it’s with that ambition that we wanted to start something now because we want to start building the right way of talking to those devices ahead of the generation that will do that every single day with every bit of technology that it talks to you.
Andrew: 05:59 Do you think voice assistants have really, or smart speakers specifically really started the notion of voice as an input mechanism and that’ll help drive growth?
James: 06:07 Voice as an input mechanism has been around for a long time because we started talking to one another and so as an input, it’s been around for a long time and obviously, many people have come from using voice input when you’re on a telephone call, whether using a standard “press one for this, press two for that”. You probably remember that from your days in the telco industry as well. So we’ve had enough of those. So it’s not anything new in that sense, but the big thing that we see is obviously smart speakers are driving adoption of voice technology across all other platforms and that you only have to look at things like Bose this week, announcing a series of new noise cancelling headphones that all have their Bose AR platform on them. So you can talk to apps that live in your headphones, support for all of the major voice assistants.
James: 06:49 You only have to look at Apple updating the airpods 2 so that you can just say, “hey Siri” to them without even touching the headphone. My iPad is now going to try and do that. All of those opportunities are coming about now because the smart speaker is teaching us to talk to our technology. But I think we’ve got to be careful that when we a §re out talking to clients, at least we don’t want to get hung up on the smart speaker. That is probably a category of product for a time because ultimately we only have so many rooms in our house, right? You can’t keep on buying these things and unless you’re a real audiophile, the quality of audio upgrades isn’t going to get substantially better. But what we do see is that the more people put these things in their homes, the more willing they are to talk to their headphones walking down the street, the more willing they are able to talk to their car when they’re driving it, and actually when they’re out in the environment, you know, ambiently talking to a screen in a McDonald’s rather than swiping all sorts of manners of bacteria all over their fingers by touching it, which we’ve seen is a big problem. There will be much more willingness to talk to devices wherever they are because we ultimately all want that kind of, well pick a Sci-Fi example of Jetson’s or HAL or you know, kind of Jarvis. We want that experience of being able to just speak something out into the world and have the answer we want come back to us. We don’t want to have to have any other friction in between those interfaces.
Andrew: 08:05 I’ve got admit, I actually am a late adopter to the voice party. I had considered buying an Amazon echo for a while when I went and bought a Bose sound speaker for my TV that had Alexa built in, only then was “I get this”.
James: 08:20 Absolutely and that’s the thing that we’ve noticed is that for most people, a voice only experience is limited to voice only experiences. But when you start adding voice first as an interface, but voice plus vision, voice plus UI voice plus kinetic responses like telling your Roomba to start kind of going around the living room and vacuuming something or turning your coffee machine on. Those are the use cases that really get people excited about something because when you can sit in your car and say drive here and the car drives, there is something magical about that is the thing that we all wanted cause it’s the the least friction inducing way of getting something done.
Andrew: 08:56 Good example: last night as an event where Giles Rhys-Jones from What Three Words was speaking and the best use case and they built it into both Ford and Mercedes now, you can tell the car where you want to go with a what three word address. You want to to “paper.cut.rock” rather than having to scroll through and find the street and they dress. And I think that just removes the friction and I think we’ll see more of that, and they’ve kind of designed, I don’t think they’ve just done it voice first, but they’ve really designed the system to be voice friendly.
James: 09:20 Absolutely. I think if you look at what Giles and Chris and the team over there have done, is a fantastic use case for what they’ve done on Alexa and now obviously what they’ve integrated into the head units on the cars as well is because natural language understanding facilitates something very special there, which is that ability to get to something very specific through using those three words. In that case, and I think that that is the thing that everyone is maybe searching for, is that everyone’s looking for the killer app in voice.
James: 09:44 I get asked this all the time, as you know, what’s the killer app in voice? I keep saying to people is that the killer app for Alexa isn’t some skill. The killer app for Alexa is Alexa. The killer app is natural language understanding in almost all cases, a voice experience is essentially “get me this information from this database”. That’s all it is, but it’s done in a way where you don’t have to think about drop-down menus and parameters and click through funnels and all of those kinds of things. it’s just simply collapse all of the things you might want to ask of a database and just do it as a question and anybody can do that, and that’s why we’ve seen these things adopted so rapidly across all generations, across different backgrounds, ethnic, geographics. The whole thing is that, you know, you can buy one of these things for your five year old or your grandmother, put it in the living room and say talk to it, and they can, and that’s the beauty of it.
Andrew: 10:30 I’ve been using the Google duplex experience in my talks for a while now, both when they launched at [Google] IO last year in the updated version, and I play the video, you probably do it as well where they ring up the restaurant. And I then say, this isn’t science fiction. You can have it in a Google Pixel 3 now and they’ve upgraded that. What do you think duplex is going to do for voice in and digital agents generally?
James: 10:50 I think duplex is a really interesting experiment and I say it’s an experiment because, okay technically you can do it for a very discreet set of use cases in a certain set of markets with the right device, but ultimately it’s an experiment, it’s a way of showing the future of where we’re going to go. I think that it presents some really interesting solutions to problems, particularly when you’re thinking about accessibility and you also think about tasks that we all want to get done asynchronously,
Andrew: 11:13 … but the last mile where the restaurant doesn’t have a booking system, so you need someone or something to ring them up.
James: 11:18 Absolutely. But you still need someone at the other end to answer the phone. I think that duplex on the web, which obviously it was just announced at [Google] IO where it can scrape what would otherwise have been a web form and answer it for you, like booking a car … they demoed booking a national car through duplex on the web, which is essentially using the same intelligence but without having to actually do the voice bit is really interesting, but there’s also some things built into that, which you know, whenever I showcase that video to people always comes up is it’s that uncanny valley idea. It’s that thing of this is so real that it’s almost too real that I don’t want to use it. It’s the “Aha, awesome, great”, and all that stuff that’s built in, which is fantastic in one sense because it’s a more human way of interacting with something, but it’s also the thing that makes everyone feel slightly queasy.
Andrew: 12:02 Do you think we’ll have a time though where we won’t know whether we’re talking to a human or assistant or you have to overtly say as Google does with their disclaimer, by the way, I’m the Google Assistant, I’m not real. I’m calling to book a table at the restaurant.
James: 12:14 I think that we will probably come to a time at some point in the future where we don’t have to say that, but because we’ll have built up a standard set of ways of talking to these things that account for that solution. The Google assistant talking to you and telling you first. Whenever I ask anybody, if you are talking to a Bot, would you want to know that it was a Bot talking back to you? Almost 100% of people say yes, I’d like to know it’s a Bot because we don’t like the idea of being duped by something that feels almost too real and I think that is one of the problems of the adoption of these things is that you know particularly, if it doesn’t have a visual interface or you don’t know that you’re talking to a bit of technology, it presents some really interesting ethical and moral questions of how we talk to these things because particularly when you talk about children, teaching kids to use these things, most kids behave the way that you or I behave on the [motorway] M25 and we get cut up [cut-off]
James: 12:59 It goes wrong, you shout at it because you can, because the other thing can’t really hear you. Yes, it can hear you but it can’t really hear you. Most British, if someone cuts you up in the street, however you’re the one that says, sorry. In the car, you’re the one that’s screaming bloody murder at somebody else because there’s that technology gap that allows you to feel separated from that person and the same thing happens with these technologies is that we feel able to somehow treat them badly because we know that they’re not real or they’re somehow different from us, but when you can’t tell if it’s a real person at the end of the phone or not, that presents really interesting questions because how will we teach our kids to talk to these things? Should we be saying please and thank you to set a really good example to them because the future they will live in will be one where they don’t know. and that presents some really interesting kind of challenges for us.
Andrew: 13:47 So those of you following the podcast series, last episode, episode two, we had Minter Dial on, talking about his new book, heartfelt empathy, or “heartaficial empathy”, and this brings it to the very interesting point of artificial empathy. Can you teach a machine to have empathy? Should machine say please and thank you through a voice interface.
James: 14:03 Today you can teach machines to have artificial empathy. You can’t teach them to have real empathy because we’re not near general intelligence by any real stretch, and I think anyone that kind of tries to say that, oh, it’s going to happen in the next five to 10 years is quite frankly either …
Andrew: 14:16 I think the next 50 years probably to get it right.
James: 14:19 I think we may see it … some version of it in our lifetimes, but I’m using lifetime as a very long lens. I don’t think that we will see broad consumer adoption of it in our lifetimes. I think that you’ll see in the early days of the Internet being, you know, kind of in institutions, in large corporations in the IBM’s of the world, in the, you know, kind of Harvard University’s of the world you might see a general intelligence having some operational capability there. The idea that you or I will have our emotional intelligent AI bot that lives with us as the digital twin of us in the 2050s in the 2060s is I just don’t think it’s going to happen in the 2100s maybe … if we make these massive leaps in terms of computational power then potentially, but we also have the bandwidth issues of 5G and 6G or whatever other “G” will be on by that point to cope with all of that traffic. And I just don’t think we’re there yet.
James: 15:09 And It presents some really interesting challenges about, you know, kind of AI at the edge as well as actually building in capabilities into these AI systems, so if they have to live off the grid discretely, privately and on the edge when it comes to the network side of things, how do you make sure that you build them in such a way that isn’t built in biases that you can’t update over time? That presents some really interesting challenges. I think that there’s a number of people pioneering, you know, some great work on that, but it’s a massive ethical question that we also have to overcome.
Andrew: 15:38 So going back to something you said about Google Duplex, you said it’s an experiment, and I agree with you. In fact if you look at a TechCrunch article from a few weeks ago, hey actually tested it and they think that it’s a hybrid of humans and AI … How long does this experiment have to go on until it’s totally AI automated?
James: 15:55 It all comes down to adoptions and more people use it, the more the model gets trained, the ML comes up, things get better, but it’s still within a discreet use case. The minute that you try and kind of talk to the Bot that can book you a hair appointment. If you suddenly talk to start talking to them about maybe massages, maybe it can pivot in that direction and book a similar type of thing, but if you start asking it also, oh, and could you also make sure that my dry cleaning is picked up afterwards, it’s going to fall over immediately. And so, it’s how hard people test these things, how many different places they try and take it to, which is why I think that gap between discreet and general intelligence is still quite wide. I think the adoption of us using specific bots and specific applications for specific use cases, that we are going to see grow. It may not be there in the way that duplex does it because actually, it’s not massively efficient to have a robot calling up a business to answer a question. It takes the load off of you as the consumer making that choice, but it puts a lot of load on the business at the other end receiving the call and you may see a two-tiering or multi-tiering of businesses that are just not willing to accept calls from robots because they are too above it.
Andrew: 16:57 “robots not allowed here!”.
James: 16:58 Yeah, maybe you can use it to make your Domino’s order, but are you going to use it to book a table at the Ivy cafe in Blackheath? … probably not because they might go “I’m not taking a looking from a robot”. Thank you very much. I mean the same ones that have refused to go on things like open table for example, will happen again in these technologies. “No, we’re not going to do it. We’re above that.” You know, you’ll get classes of restaurants that just refuse to take the table bookings from duplex, maybe.
Andrew: 17:22 So I understand from your work there’s a real science with voice and I know that you did some AB testing recently with Alexa skills with different phrases and you found that a user response to a particular prompt increased by 14% when you change one word. So is there a real science here and should people come to agencies and companies like you to really understand how we speak and how we can be understood?
James: 17:43 The thing about the natural language of the way these things are talking to you is that, you know, ultimately it’s about persona. It’s about brand and persona in bodies. And we get this come up a lot. You’ve probably got brand guidelines somewhere, right? That’s probably a PDF document that a consultancy did for you, that cost you a lot of money that lives on a sharepoint that you’ve never looked at — it’s a joke, but it’s often true, but what I bet you don’t have is brand guidelines for audio and sound. What does your brand sound like? Is it posh? Is it from the northern powerhouse? Does it drop its “Ts”, does it know slang, could it talk about grime music?These are questions which if you talk about the person, you can have that conversation and that find out about someone. Can you do that with your brand? The difference you might get between a brand that can answer those questions and cannot may be the differentiating factor between which one of those two “people” (with the big air quotes) you want to talk to.
James: 18:39 §And so that’s why this kind of focus on things like language, tone of voice, persona and particularly situational design as well. In terms of thinking about not just using a assistant at a specific time with a specific user but in a specific place is really important because it makes a big difference about the way that someone talks to you when you’re driving your car versus the way that person talks to you when you’re at home in the kitchen. For all of the reasons that are plainly obvious, safety, brevity, whether or not there are other people in the car with you, what data it reveals, when there might be other people around versus just you and also does it know if it’s you I’m talking to? You know, one of the big things that comes up a lot is how do I know when my smart speaker is talking to me versus my kid?
James: 19:20 Do I want it to reveal healthcare data from my Babylon health app to me or to my wife? You know, these are situations that actually dictate the way in which a voice assistant needs to change the way it interacts with you , and we’ve never really had that before because ultimately your iPhone app behaves pretty much the same every time you open it, regardless of where your phone is because it travels with you, but these assistants travel between surfaces and devices regardless of whether you’re the user or someone else becomes the user, and so that presents these different situations, and so that way of thinking is, is a different way entirely of building a brand persona, a different way of writing content, a different way of thinking about use cases you might want to provide for as well.
Andrew: 20:03 So interesting what you say about brand tone and brand voice. and we know that Intel and McDonald’s, probably the two brands that have a vocal signature that we can probably recite now, but I’m just thinking as you were speaking at the moment, if we use Siri or Alexa, we have one of six different voices we can choose, in fact, as a sidebar, “Aussie Karen” for Siri, i’ve actually met Karen Jacobsen, there’s a whole story behind that, which maybe I’ll interview for another podcast, but can we see a time where if I’m talking to Peter Jones or John Lewis or Sainsbury’s, the voice I’m hearing is the voice of the brand because it is a female or a male or northerner as you say, because at the moment I’m talking to Apple or Alexa’s brand voice.
New Speaker: 20:43 I think a lot of people kind of talk to us about when get into this discussion about, well, should I be thinking about creating my own voice persona? Very, very few brands in the world have a) the brand recognition and b) really the need to create a true full ambient voice experience, but creating experiences for these platforms that use a brand tone of voice I think is applicable to nearly everybody. We just got done with something. a project, it’s a couple of weeks ago for Diageo for Talisker whiskey, Scotch whiskey from the isle of Skye, and we use the Alexa voice for a lot of navigation, so when you’re talking to Alexa, how to start the experience, how to stop it, those kinds of things, but when it comes to the content, the thing that really matters of tasting scotch whisky at home with this tasting experience, we use the voice of the master distiller from Scotland because the timbre of the voice is different, the tone, the richness …
Andrew: 21:37 … the authority, otherwise, it’s your normal voice you speak to everyday versus the wine maker.
James: 21:40 Exactly, and so, if you care about whiskey, you care about the voice of where it comes from and exactly that. It’s about authenticity. It’s about quality, and it’s also about making that audio signature in the mind of the user very clear about this is who I’m talking to, and when we talk about devices that are predominantly just voice that matters massively, obviously when you can then marry that with things like visuals or video animations on devices like the echo show or the Google home hub or a fire TV, one of those other things, then obviously that’s amplified even more so, but particularly in the voice only world, voice and audio brand is a huge thing here, and it’s not just audio signatures like the Intels and the things of the world, but it is thinking about this kind of rounded sense of what VUI …
James: 22:24 … so we talk about VUI (Voice User Interface) kind of sounds, are you going to use things like progression sounds, things like confirmations and these are things that we actually all really familiar with. We just don’t think about it. If you’ve ever played a video game is absolutely jam packed with this stuff and it’s absolutely applicable to bring in these things over, and again, if you’ve listened to radio, you know what radio station you’re listening to to, because every half hour someone goes, “You’re listening to BBC radio 2” or “Radio 2”, or whatever it might be. Yeah. So we’ve learned an awful lot through working on radio projects through podcasting in terms of all those audio cues that you might use and also through decades of producing digital experiences where you might want to bring those things together in voice because you actually need more than just a voice to make sure that a user knows where they are because of the nature of where they use these devices. If you’re in a kitchen and you’re half listening to the thing that you were doing and a timer goes off, you need to know a timer has finished to stop brewing your coffee or start the next step of a play experiment that you’re doing at home with the kids on the dining room table or any number of things you might do with a skill for Alexa or an action for Google.
Andrew: 23:27 So now we’ve got these smart speakers in the home and they’re basically directing things that we’re doing that we weren’t probably doing a couple of years ago, brands have got a whole other thing to think about in terms of their voice strategy. Are we going to have a voice department or a voice strategy soon?
James: 23:40 We’ve not seen many people actually a invest yet, certainly in the UK, in Europe, in “heads of voice” or those types of titles yet, but I think they’re coming in terms of voice agencies and consultancies obviously at Vixen Labs, that is our response to the market is that we do see dedicated specialism required in this space and I think that’s where we’re obviously kind of coming about serving it, and I think over time we will begin to see departments. we will begin to see owners inside brands, but you know, much like social media, much like mobile, there’s been that cycle of okay, the specialist team comes up, then it gets disseminated into “brand” or “digital” catch-all titles and then comes back again is bound to happen. I think that will happen over a number of years now. I think the biggest thing that’s caught many people on the hoof is just the speed of adoption of this stuff is that unlike social, which had a kind of slightly longer on-ramp, we’ve seen voice in particular in Europe take off in the space of less than 18 months in many ways.
James: 24:35 It’s an estimate, it was trying to think around a little bit.
Andrew: 24:37 Why do you think that is, is it because the devices are quite cheap and they work reasonably well, but what has been that spark?
James: 24:43 I think there’s probably three main factors. One is definitely the price of devices. If you think about the adoption of the smartphone compared to the adoption of the smart speaker, the cost differential is just huge.
Andrew: 24:52 I mean £40 versus £400
James: 24:53 … and come black Friday, probably get one for £20, heck, you’d probably get one bundled in for free at some point or other. Amazon in particular have been almost giving the things away, I mean the echo show 5 which was just launched yesterday of date of recording [30 May 2019], the new five inch version of the device, they’re doing a bundle already “buy two and get £25 quid off”, right, because they see the need to kind of compete in that space. So cost is definitely one.
James: 25:15 Second is, you know, usability is that, like I said before, whether you’re a grandmother or a grandchild, you can use one of these things and you can get the majority of things you want to do with it done really quickly. And the third is just again a more a pivot towards voice usership in general across all different platforms, and the smart speaker is the most logical embodiment of what that looks like. It’s hard to argue necessarily correlation or causality in this, but the adoption of things like podcasts and audio books and a renaissance in general, audio and radio has been coupled with this massive rise. Now, I personally feel that actually voice adoption in the US catapulted a whole new category of audio content in the US which has had a weird boomerang effect on now people buying more smart speakers to listen to things like podcasts here in Europe, because that audio revolution has kind of happened, and then slung us back and technology to get it.
New Speaker: 26:04 So there is kind of a weird cycle going on there, but whichever way round is, there is more audio content out there than ever before, and the reason I think that why we are listening to more audio content than ever before, which is why smart speakers and voice is growing, is because we have reached “peak screen”. For many of us and particularly as both of us are parents with kids, you know that that is absolutely true in terms of the concerns that people have around kids being in front of devices the whole time, but quite frankly I’m just as concerned about my parents use of their devices as much as my own, and I think you only have to look at things like screen time and all of these other applications that have entered into the market to manage our digital dexoxing, and audio is a great response to that because if you can do something, get the same content but without all of the addiction cues that come from screen based devices, ultimately that is probably a good thing for us from a mental health perspective.
James: 26:54 So I think if you look at those three factors, audio because of the screen rebellion, usability in terms of it can work for anybody and just a straight cost and supply and demand question. That’s what’s catapulting this market.
Andrew: 27:09 So voice beyond the home. What other uses are emerging that you’re seeing?
James: 27:12 So I think voice out in the world is going to be a big trend over the next 18 months. In particular, cars has to be the big one, I mean Amazon presold at a million echo autos in the US — the little plugin device to put the echo in the home, but really the smart speakers on the phone, not because you’re going to use your phone more, but because smart speakers can live with you wherever you go. You’ve got constant position, data, connection, all those kinds of reasons. This is particularly interesting. I think if you look at what Bose in particular are doing and the headphone market is really fascinating, they just announced the new noise canceling 700s, 500 ear buds and more to come yesterday as we record [30 May 2019], and all of those devices supporting not only Siri or Google assistant through the native button, but Alexa over the top as well. So giving you a choice of assistance is particularly interesting. For anyone who’s seen the demos that the Bose AR platforms showed at south by southwest this year with partners of ours at Vixen earplay who are behind the technology of some of those Bose, iOS apps.
James: 28:12 The idea that you can have ambient voice experiences, things like everything from running, the experiences through to treasure hunting games through, to straightforward navigation is just huge, because the more, if you’re out walking around and not having to use your eyes to look down at a thing and just talk to it, that’s what we’re all after, so I think there’s big opportunities there. I think the one thing that I don’t see a lot of people talking about when it comes to AR is the rise of voice and AR. I think that is going to be a huge coupling of those
Andrew: 28:39 Telling, rather than swiping a screen, you can say “turn left, turn right, jump!”
James: 28:42 Absolutely. it makes a lot of sense, it you’ve already got to hold up a piece of glass in front of your face and look through a camera lens at the world, the last thing we want to do is use the other hand to poke around on it. It’s just not a very practical use case, and I think one of the biggest limiting factors to the growth of AR technologies is that actual, the clunkiness of using a thing with two hands when you’re out in the world, and particularly if you talk about it for kids where predominantly tablet rather than phone is the main device, I don’t know if you’ve tried holding up a 10 inch device for, half an hour to play with something and you’re a child with an arm that’s 1/3 of the length of ours, it’s really hard work, but if you can hold something up and talk to it and it was controllable, then I think that is huge. So I think, on the foot in the headphones coupled with AR is going to be really interesting.
James: 29:26 And then the other one, there’s got to be in the infrastructure and that’s in infrastructure in the retail environment, but I’m particularly also in the enterprise and you look at partners of ours and the guys at Umar for example, we’ve now developed a whole system for essentially things like meeting room bookings, HR requests, you know, beginning to see integration of Alexa into things like polycomm and various other Cisco devices. I mean, the receptionist is no longer needed. I don’t think so. I think that for a lot of those tasks or most of the points that you’ll get much better value out of your receptionist because they won’t be doing low administrative tasks. They can be focused on Wifi code. I told you 10. Exactly. It’s all of those things. You’ll still need an it person to come and plug an htmi cable because no one seems to be able to do that.
Speaker 2: 30:05 Being able to in this room that we’re sitting in being able to say to Alexa, “Hey, can you read book this meeting room this time next Tuesday for the same guests?”, that’s something that we should just be able to do because it’s such a programmable thing, so we are particularly excited about voice in enterprise use cases and particularly for those of us that work in large corporates or with large corporates, we spend so much of our time surfacing data that we already have that it’s something crazy that the average desk worker spends about three hours of their day just retrieving something that they already know exists. The ability to do that with your voice, things like getting last week’s sales numbers, how your ad units are performing, location data of all of your sales team, all of those kinds of questions …
James: 30:46 You should just be able to ask Salesforce or ask Zoho or any of those systems, why have you got, to go poke around with seven different drop down menus and a pivot table to get the answer to that question.
Andrew: 30:56 So our kids will walk into an office of the future if that exists and they’ll have everyone talking to themselves.
James: 30:59 … or talking to one another or more of the point, probably a system that is listening ambiently in that space because it’s been given discreet authority to do so in that moment, picking up what’s required and answering and surfacing that data before you even ask the question. That is the direction I think we’ll probably had in.
Andrew: 31:15 So that brings me to an interesting question about security. When I’m on stage and I have a Q&A, probably the most often question I get asked is, “Is my phone listening to me?” Now I know you’ve got a point of view on this because obviously Alexa and Siri have wake words. There is a value to having the device listening to what you’re saying and picking intelligence out, but it’s the privacy squeeze where what do you give away and what value? Where do you see that going?
James: 31:38 I think this is reliant upon a couple of different things. One is I mentioned before, AI on the edge, so the ability to do a lot of that computational power on device discreetly with never actually calling up a server, I think if we get that truly happening, which I think Apple has actually made some of the most stringent strides in that area with their kind of discreet approach to data security on device, that may begin to solve some of these problems, if you’re out here saying, no, all that’s happening is it’s listing, but all it’s doing is just working something out on your device, it’s no more worrysome than a calculator listening to you to answer a math problem, then we might get over that hurdle.
James: 32:11 The thing that I often tell people is that actually though for it to be constant listening to you, we’ve got a lot of work to do in infrastructure. It’s like, we’ve got to maximize massively in terms of data centres in 5G connectivity and not least, battery power. Crikey, if your phone was actually listening to you all day, everyday and uploading that to the cloud so that Facebook can retarget sneaker adds to you, your phone would have exploded in your pocket like a Samsung Galaxy Note 7 quicker than you can say “Alexa”. It’s just not happening. I think that is one of the big myths that we have to get over, but everyone wants to go to the kind of Black Mirror science fiction dystopia because it’s more interesting ultimately than the Utopian version.
Andrew: 32:48 So people are always going to worry about that. Data from Adobe, I think it was last year, showed that for people that already own a smart speaker, if you ask them, are you worried it’s listening to you, 40% of people say, yeah, I’m worried it’s listening to all the time. Yet they’re still buying them. So we say all the time that we’re worried about privacy and I think you’ve only got to go and look at what’s happening at the moment with Huawei in the US or the recent backlash against WhatsApp. We worry about it all the time, but in reality it doesn’t shift consumer adoption of these thing one jot. People will still be buying Huawei phones in the US for the next 12 months, regardless of what Trump tries to do.
Andrew: 33:22 I think it brings up an interesting point though because if it’s useful and I can think of a couple of use cases. So if my smart speaker in my home is always listening and it hears a baby cry or a smoke alarm or a window smashed and it responds, that’s helpful. But also in mental health, I know you’ve been a mental health advocate for some time as I have. If it can detect a change in your voice and then tell someone that cares that James just, doesn’t sound himself today, I would like a system that could help me.
Andrew: 33:47 We do enter into some, big ethical questions there about predictive analytics, about looking at sentiment analysis and all of those kinds of questions. There has to be a direction of travel where we each take greater ownership over our own data and what we release to people. The history of the Internet in the past couple of decades has been handing over privacy in return for utility hand over fist, and many of us have seen that rubber band, the privacy kind of stretched to maximum capacity.
James: 34:12 So the only response to that in my mind is that, okay, we can’t rely upon companies or governments to deal with it, we have to rely upon our own adoption of this. I think in the wake of GDPR, we’ve seen a little bit of movement on that, people waking up a little bit to that — I should probably be a bit more cautious about who I let have access to this, that and the other, and certainly if you ask anyone under the age of 21 about fake Instagram profiles or deleting their posts when they’re done with them, or disappearing content on Snapchat, there seems to be a waking up to that in the next generation for the majority of the work in public, currently, I don’t think we’re there yet. And that goes to health data. I would like my Doctor to know in advance whether or not going to have a cardiac event.
Andrew: 34:49 So would I, and he can because I’ve got my fitbit data but he doesn’t know what to do with it.
James: 34:54 Exactly. So I think we need better systems and we need less silos, but for that we also have to give up yet more privacy to get back that utility, and therein lies the jump that we have to make, and that chasm is a hard one to cross. I think it’s more of an individual opt-in basis than we’re ever going to get to kind of blanket systems that do it for us.
Andrew: 35:10 I think it comes back to the informed consumer, that he or she does the extra reading and understands, so they don’t ask you and I “is my phone listing to me?”, and many understand that, and they understand what privacy they’re giving away. I’m on one of your WhatsApp groups, the “Voice2” WhatsApp group. It’s an amazing community of people and they share ideas around voice. What have you learned from the community so far?
James: 35:31 I’ve learned that we’re still at the beginning of the beginning because so few people have got best practices in this and that’s why the community exists, so we started Voice2 which if people want to find or join, then just go to voice2.io and they can sign up, and we started that because we wanted to find other people working in this area. It’s exploded, it has been great fun doing it, but I think the thing we’ve learned the most is that we’re all so early in this and that there is no, like I said before, there’s no killer use case yet. There will be, I have 100% faith in the fact that there will be use cases that come about where we begin to realise …
James: 36:05 … in the same way that most of us couldn’t probably live without things like mobile banking today, I think we will find those similar use cases for voice, I but just don’t think we’re there yet. I think that’s actually the biggest thing is that the opportunity space is huge, but those that are filling it. it’s still kind of way behind.
Andrew: 36:22 So as I promise my listeners every episode, because this is the Practical Futurist Podcast, I’m going to hit you up for some practical advice. What are the three things listeners can do next week to start on the voice journey?
James: 36:32 The first is probably go buy a device if you’ve not got one in your home, you’d be amazed the amount of people that we speak to at Vixen, like Innovation Directors, Digital Marketing Directors who don’t have one.
Andrew: 36:42 You should have some on on the shelf to say “here, take it home …”
James: 36:44 We do, because go buy one. If you’re not living with one of these things, then it’s hard to pontificate, it’s hard for you to have an opinion on mobile phone app design if you don’t actually own a mobile phone. So go get one and live with it, and if you don’t it like it listening you to all the time, you can flip the microphone button off whenever you don’t like it, but try it and use one, that’s definitely point one.
James: 37:06 The second thing would be go get informed and join one of these communities. There is so much information out there, I think that that is a really important thing to do. Whether that’s voice2 or whether that’s going and joining some of the events, we also have the voice summit coming up in July in Newark — come along, something like that and actually get kind of informed about it because it is an early stage thing and everyone’s learning together. So I think that the practical thing to do is actually to get engaged and learn more about it.
James: 37:30 The third, I suppose it’s actually like try some of these experiences, try making your own, anyone can make a skill with Alexa blueprints, for example, which is the thing that Amazon allows you to do. So you can make basically mini-skills for your own home. If you want to give the WiFi code to the babysitter, or if you want to play pranks on your kids or set your airbnb up so that your airbnb guests have the information — they know about how to turn off the stopcock if the water’s running, those types of things. Anyone can build those, you don’t need to learn how to code, you don’t need to learn how to do anything, really in voice, you just have to edit a script and upload a bit of audio, it really is quite simple.
James: 38:05 So I think those are three things. Get educated, try it out yourself and if you don’t own one, go buy one! They cost £30. If anyone’s listening and hasn’t got one and they want to get started in voice, and they want to do a project and where don’t own one, one listener can come and I will buy you an Echo dot.
Andrew: 38:20 Here you go, there’s the special today. Who’s going to collect? I’d love to know who actually contacts you and claims that. James, fascinating discussion, I thought I knew a little bit about voice, I know a lot more now. Where can people find you, find more about you and your work?
James: 38:34 The best place to go is to go to vixenlabs.co and that’s all you can find us @vixen_labs on most social media, and that’s where you’ll find out more about me and what we do at Vixen. Like I said, if you want to join the community, go to voice2.io and you can sign up there for the newsletter, come to a MeetUp, come and join us out in Newark for voice summit, which is the world’s largest voice event happening at the end of July 2019, and you can then join our very noisy WhatsApp group to talk about voice stuff.
Andrew: 39:04 This has been the Practical Futurist Podcast. You can find all of our previous shows at futurist.london, and if you like what you’ve heard on the show, please consider subscribing via your favorite podcast platform. You can also hire me to speak at your next management offsite or customer event. More details than what I speak about with video replays can also be found at futurist.london. Until next time I’m the Practical Futurist, Andrew Grill.