Both VR and AR have made some impressive advances over the past 12 months. New experiences are now possible — things that lived only in dreams just a short time ago.
Imagine watching extinct animals or dragons stroll by at your local park. Or opening a portal in your room, transforming it into a bustling cityscape with your walls suddenly covered in graffiti. Scenarios like these are no longer fantasy or science fiction. Instead, they're examples of what's possible — on your smartphone — through the power of augmented reality.
In its simplest terms, AR uses technology to virtually change how you see the world around you. Facebook is doing this now through a camera app on your phone.
To do this, we need to build a map of the surrounding environment as it's being explored in real time. While doing that, we also need to accurately estimate the position and orientation of your phone's camera with respect to that map. This ability to place and lock in digital objects relative to real-world objects is known as simultaneous localization and mapping (SLAM), and it's an ongoing challenge in computer vision and robotics research.
History of SLAM
Getting to the point where we can run SLAM on mobile devices took more than 40 years of research. The first SLAM techniques were published as research papers in the 1980s and were originally developed for robot navigation in unknown environments.
In those early days, SLAM leveraged expensive or custom-made sensors such as LIDAR, SONAR, or stereo cameras. But with the technological advancement and adoption of modern smartphones — almost all of which now contain at least one camera, as well as a gyroscope and accelerometer—capabilities that used to be restricted to specialists are now available to everyone. Today, SLAM is used to not only place objects in a scene, but also for a range of other applications, including self-driving cars, robot vacuum cleaners, and minimally invasive surgery.
Mobile SLAM at Facebook
Our Applied Machine Learning (AML) team, which takes the latest advances in AI research and turns them into infrastructure for new products, leveraged initial work done at Oculus in their Computer Vision group to build and deploy SLAM at scale. Along the way, there were three key engineering challenges.
Our SLAM library integrates features from multiple systems (ORB-SLAM, SVO, and LSD SLAM), but what really sets it apart is the library's performance optimization, down to the very last instruction. Having a SLAM system capable of running at 60 Hz on mobile devices is hard: every 16 milliseconds, your phone has to capture an image, find hundreds of interesting key points, match them with the same points in the previous frame, and then use trigonometry to determine where each of these points is in 3D space. Since there is so much going on, we needed to do lots of fine-grained optimization and rethinking of how these algorithms operate.
On top of that, the challenge of deploying mobile SLAM in the Facebook ecosystem is that our community uses a wide range of mobile devices. We want to support as many of these as possible, so part of our efforts are to ensure that our SLAM implementation is backward compatible.
You can see an example of this in the requirements for device calibration. Both iOS and Android phone models have unique characteristics, but Android is especially diverse, and there are thousands of device models with varying hardware capabilities. Each model has a different camera calibration of focal length, principal point, and distortion parameters, so that we can project 3D points into the camera space with sub-pixel accuracy.
Also, mobile devices have rolling-shutter cameras with autofocus and autoexposure, which need to be taken into account as well. As the camera focuses on things that are nearer and farther, this calibration changes; the IMU (inertial measurement unit, which tracks device acceleration and rotation) also needs to be calibrated; and the camera and IMU clocks need to be synchronized. We start with a rough calibration for each model, and fine-tune it for your specific device over time.
The Facebook app is already one of the more complex apps in the Android or iOS app stores, and we are constantly working to add exciting new features to the app while keeping its total size as small as possible. The original SLAM library was developed at Oculus, for a different use case, and was about 40MB in size, since it used multiple, large open-source libraries. We extracted the minimum SLAM functionality that would enable our work and refactored it to use common Facebook libraries, bringing the library size to under 1MB.
Creating compelling mobile AR requires more than just leveraging SLAM. We started exploring our first prototypes to place 3D art over the reconstructed surfaces from SLAM last November, and since then, started UX research on the most intuitive gestures to place and replace art, switch art, and rotate/pan/zoom art after it has already been placed, to let people accurately frame their compositions through their mobile devices. We explored how to recognize specific locations to place AR content, and to analyze the scene geometry to make virtual objects stick to the real surfaces.
To create a better user experience, we also needed to account for the failure modes of our technologies and develop alternate solutions. To that end, we created the WorldTracker API, an umbrella interface that combines SLAM with other tracking algorithms to “place things in the world.” The current version of World Tracker transitions between SLAM and a gyroscope-enhanced image-based tracker to place things in the world, when SLAM is not confident about its location.
Facebook's first AR-driven art project with Heather Day
Having built these basic tools, it was time to work with an artist to help us learn new techniques of making AR feel authentic and part of everyday life. We invited Heather Day to the Menlo Park campus, where her artwork would be virtually installed. Any time she poured paint, made a brush stroke, drew a pattern, or made any other type of mark, the AML team captured those movements on camera and added them to a digital library.
The AML team worked with Heather to determine which images should be given to the animators, and what movements they should make in the living, breathing AR installation. Within two weeks, they built technology that would recognize the specific location for the art and analyze the scene geometry that made Heather's virtual installation stick to real surfaces.
At our F8 developer conference this year, the audience saw Heather's art come to life with rhythm as it flowed off the walls and onto the ground like a waterfall. Through SLAM technology, and her creative expertise, we erased the boundaries between virtual and real—between science and art—and in the process gave a glimpse into how technology and art can be intertwined. This is our vision of enriching everyday life with the possibilities of the virtual, digital ecosphere.
AR gives us endless ways to engage with and experience the world. While we've come incredibly far in improving AR technology, there's more to do. Our next step is to create even more geolocated and persistent experiences, like what we built for Heather's AR installation in Menlo Park. Further ahead, we're exploring how to combine the power of deep neural networks and Caffe2 to create more complete SLAM maps, handle dynamic objects, add semantic information, and create persistent AR experiences deeply integrated with the Facebook ecosystem. We're excited to dive deeper into these concepts and will keep you updated on our progress.