What if a virtual assistant could help you remember the most important parts of your day? Tell you when you’ve forgotten to add vanilla to your cookie recipe? Or remind you where you left your keys, or that you left the oven on? The early research the Facebook AI Research team released on Embodied AI is getting us one step closer to that reality.
To gain a better understanding of the world from a first-person perspective, we’ve been training AI on data sets (like this one) that mimics the real world. We’re building on AI Habitat, which we first open-sourced last summer, and the first-of-its-kind 3D simulation platform designed to build more intelligent virtual robots. All of this work is in service of our long term vision: to build a virtual assistant that can help you every day.
We also announced three milestones that get us closer to that goal:
- Semantic MapNet: We taught AI to build top-down maps (showing where an object is located) and spatio-semantic memories (aka mental maps) from a first-person view (e.g. what is behind the sofa?).
- SoundSpaces: Until now, embodied AI agents have been unable to hear. We’ve developed the first platform that provides both visuals and audio. It has highly realistic acoustics, meaning agents can now hear, see, and navigate to a sound-emitting target: For example, it can find a ringing phone and let you know where you left it.
- Occupancy Anticipation: We’ve taught AI to rapidly explore and map an area even when regions are partly hidden, obstructed or out of view (e.g. behind a table).
Technology is our best tool for progressing society; making things faster, cheaper and more accessible for more people around the world. Improving the memory of billions of people sounds daunting, but this technology may one day do exactly that.