Platform for Situated Intelligence: An open-source framework for multimodal, integrative AI

Over the years at Microsoft Research, we’ve studied how to build AI systems that perceive, understand, and act in a human-filled world in real time. Our motivation has been to create computing systems that can support interactive experiences akin to what we expect when we talk to or collaborate with people. This research line has involved the development of several physically situated interactive applications, including embodied conversational agents that serve as personal assistants, robots that give directions in our building, and smart elevators that recognize people’s intentions to board versus walk by. Building such systems has required composing and coordinating different AI technologies to achieve multimodal capabilities—that is, the joint use of multiple channels, such as speech and vision. Our efforts in this space have highlighted challenges with creating integrative AI systems—that is, systems that weave together multiple AI technologies such as machine learning, computer vision, speech recognition, natural language processing, and dialogue management.

A group of three images arranged side-by-side: a humanoid robot gesturing with its left arm to two people; the view from the robot’s perspective, with face-tracking rectangles

<a href=

Platform for Situated Intelligence: An open-source framework for multimodal, integrative AI

Leave a Reply Cancel reply