At Facebook Connect in 2020, Abrash explained that an always-available, ultra-low-friction AR interface will be built on two technological pillars:
The first is ultra-low-friction input, so when you need to act, the path from thought to action is as short and intuitive as possible.
You might gesture with your hand, make voice commands, or select items from a menu by looking at them — actions enabled by hand-tracking cameras, a microphone array, and eye-tracking technology. But ultimately, you’ll need a more natural, unobtrusive way of controlling your AR glasses. We’ve explored a range of neural input options, including electromyography (EMG). While several directions have potential, wrist-based EMG is the most promising. This approach uses electrical signals that travel from the spinal cord to the hand, in order to control the functions of a device based on signal decoding at the wrist. The signals through the wrist are so clear that EMG can detect finger motion of just a millimeter. That means input can be effortless — as effortless as clicking a virtual, always-available button — and ultimately it may even be possible to sense just the intention to move a finger.
The second pillar is the use of AI, context, and personalization to scope the effects of your input actions to your needs at any given moment. This is about building an interface that can adapt to you, and it will require building powerful AI models that can make deep inferences about what information you might need or things you might want to do in various contexts, based on an understanding of you and your surroundings, and that can present you with the right set of choices. Ideally, you’ll only have to click once to do what you want to do or, even better, the right thing may one day happen without you having to do anything at all. Our goal is to keep you in control of the experience, even when things happen automatically.
While the fusion of contextually-aware AI with ultra-low-friction input has tremendous potential, important challenges remain — like how to pack the technology into a comfortable, all-day wearable form factor and how to provide the rich haptic feedback needed to manipulate virtual objects. Haptics also let the system communicate back to the user (think about the vibration of a mobile phone).
To address these challenges, we need soft, all-day wearable systems. In addition to their deep work across ultra-low-friction input and contextualized AI, Keller’s team is leveraging soft, wearable electronics — devices worn close to or on the skin’s surface where they detect and transmit data — to develop a wide range of technologies that can be comfortably worn all day on the hand and wrist, and that will give us a much richer bi-directional path for communication. These include EMG sensors and wristbands.
AR glasses interaction will ultimately benefit from a novel integration of multiple new and/or improved technologies, including neural input, hand tracking and gesture recognition, voice recognition, computer vision, and several new input technologies like IMU finger-click and self-touch detection. It will require a broad range of contextual AI capabilities, from scene understanding to visual search, all with the goal of making it easier and faster to act on the instructions that you’d already be sending to your device.
And to truly center human needs in these new interactions, they will need to be built responsibly from the ground up, with a focus on the user’s needs for privacy and security. These devices will change the way we interact with the world and each other, and we will need to give users total control over those interactions.
Building the AR interface is a difficult, long-term undertaking, and there are years of research yet to do. But by planting the seeds now, we believe we can get to AR’s Engelbart moment and then get that interface into people’s hands over the next 10 years, even as it continues to evolve for decades to come.
More Context
The biggest difference between the future AR interface and everything that’s come before is that there will be much more contextual information available to our AR devices. The glasses will see and hear the world from your perspective, just as you do, so they will have vastly more personal context than any previous interface has ever had. Coupled with powerful AI inference models, this context will give them the ability to help you in an ever-increasing variety of personalized ways and free your mind up to do other things.
Imagine having a pair of glasses that could feed you key statistics in a business meeting, guide you to destinations, translate signs on the fly, tell you where you’ve left your car keys, or even help you with almost any sort of task. Asking what else this interface will enable is kind of like asking what the GUI would enable back in 1967 — the possibilities are vast and open-ended.