Thursday, June 7, 2018

The Apple Watch relationship and removal of Hey Siri

No one will fault you for not knowing that Apple is removing the “Hey Siri” trigger phrase in watchOS 5. Instead you’ll just be able to raise your Apple Watch and start issuing your command — a feature appropriately dubbed ‘Raise to speak’. This wasn’t announced on stage at the WWDC keynote, but is displayed as a main feature on the watchOS 5 preview page.

'Raise to speak' as pictured on the watchOS 5 preview page
‘Raise to speak’ as pictured on the watchOS 5 preview page

This is a big deal

‘Raise to speak’ unfortunately isn’t in the first watchOS 5 beta, so nobody has tried it just yet. Nonetheless, this is a big deal on multiple fronts. Specifically:

  1. Having to say “Hey Siri” before every command is an annoying barrier to my goal. Without it, issuing commands will be more efficient.

  2. These trigger phrases essentially double as branding elements. People know what products you are using just be saying “Alexa”, “Hey Siri”, or “OK Google”. On Apple Watch, the only branding left is Siri’s “personality” and signature voice, the latter of which is sounding pretty damn good these days.

  3. Taking away the trigger phrase will make Siri more human and more robotic. Initiating a command will be more like natural conversation, while the nature of it all is more mechanically direct.

  4. I think this could pave the way for Siri to support chained commands, similar to what Alexa already does, but without any initial phrase.

Why do this on Apple Watch?

The Watch is powerful. From a hardware perspective, it’s getting there, but I’m talking about a security/trust perspective. Think about it — it’s always unlocked when it’s on your wrist. We haven’t had that kind of device trust relationship since the caveman days of iPhone (before Touch ID). Nobody had a passcode on their phones back then. I know I didn’t.

Apple Watch doesn’t need to scan your face or fingerprint to confirm Apple Pay. Siri on Apple Watch doesn’t need any additional verification to unlock your HomeKit smart lock. Apple Watch is the only device that can do these kinds of things with less friction than other Apple devices. It’s branded as the most personal device Apple has ever made, and this relationship is a big reason why.

You also have to wake the Watch before issuing a command, since it’s not always listening like iPhone or iPad. This is an already perfect workflow in which to add on ‘Raise to speak’.

There’s another reason, too. If you have a HomePod, you know it basically has supersonic hearing. It’s a little difficult to use “Hey Siri” on any other device while HomePod is in earshot. If Apple Watch doesn’t require said trigger phrase, issue eliminated.

I’m sure ‘Raise to speak’ won’t be absolutely perfect. There will be false positives, or times the Watch misses your command. Also, as evidenced by me saying “Hey Siri” in a bunch of different voices, Apple Watch doesn’t support Personalized Hey Siri — the ability to listen for your particular voice — like she does on iPhone and iPad. This is a feature that would undoubtedly help reduce false positives with ‘Raise to speak’. Perhaps Apple will surprise us and include it in a later watchOS 5 beta? We’ll have to wait and see.

OK Lance, calm down

Siriously, though! Any time technology can get out of or way or remove a barrier, the more convenient it becomes. And on Apple Watch, where Siri is a major input mechanism, ‘Raise to speak’ is a huge step in the right direction. I mean, you’ve gotta think trigger phrases will disappear eventually for all virtual assistants.

Hopefully next year we’ll see this trickle down Apple’s product line, almost like when they introduce a new hardware component. They start by including it in the first product that makes sense. Once they can make it at scale, it makes its way to other devices. A perfect example of this is Touch ID, which followed the path of iPhone > iPad > Mac. In this case, scale isn’t the real limiting factor, so much as being bug-free.

Exciting times are ahead.

Monday, April 16, 2018

Apple explains how Personalized Hey Siri works →

Apple’s latest entry into their Machine Learning Journal details how they personalized the Hey Siri trigger phrase for engaging the personal assistant. Here are a few interesting tidbits.

[…] Unintended activations occur in three scenarios – 1) when the primary user says a similar phrase, 2) when other users say “Hey Siri,” and 3) when other users say a similar phrase. The last one is the most annoying false activation of all. In an effort to reduce such False Accepts (FA), our work aims to personalize each device such that it (for the most part) only wakes up when the primary user says “Hey Siri.” […]

I love the candidness of the writers here. I can also relate to the primary scenario. Let’s just say I’ve learned how often I say the phrase “Are you serious?”, because about 75% of the time I do, Siri thinks I’m trying to activate her. It’s fairly annoying on multiple levels.

On Siri enrollment and learning:

[…] During explicit enrollment, a user is asked to say the target trigger phrase a few times, and the on-device speaker recognition system trains a PHS speaker profile from these utterances. This ensures that every user has a faithfully-trained PHS profile before he or she begins using the “Hey Siri” feature; thus immediately reducing IA rates. However, the recordings typically obtained during the explicit enrollment often contain very little environmental variability. […]


This brings to bear the notion of implicit enrollment, in which a speaker profile is created over a period of time using the utterances spoken by the primary user. Because these recordings are made in real-world situations, they have the potential to improve the robustness of our speaker profile. The danger, however, lies in the handling of imposter accepts and false alarms; if enough of these get included early on, the resulting profile will be corrupted and not faithfully represent the primary users’ voice. The device might begin to falsely reject the primary user’s voice or falsely accept other imposters’ voices (or both!) and the feature will become useless.

Heh. Maybe this explains my “Are you serious?” problem.

They go on to explain improving speaker recognition, model training, and more. As with all of Apple’s Machine Learning Journal entries, this one is very technical in content, but these peeks behind the curtain are highly interesting to say the least.

One thing I didn’t see note of was how microphone quality and quantity improves recognition. For instance, Hey Siri works spookily-well on HomePod, with its seven microphones. However, I assume they aren’t using Personalized Hey Siri on HomePod, since it’s a communal device with multiple users, so the success rate may be implicitly higher already. Either way, I wish my iPhone would hear me just as well.