MIT researchers are bent on making researchers see things like humans. A team of engineers developed a model that allows machines to perceive their physical environments like humans in real-time. With this ability, robots will be able to perform high-level commands.

The team calls the model 3D Dynamic Scene Graphs, which is the key in order for robots to produce a 3D map of their environment with objects and their semantic labels. The 3D map is where a robot will get information from in order to determine where an object or room is so it can navigate the way a human would.

There are two ways for robots to navigate: 3D mapping and semantic segmentation. The first method allows robots to reconstruct their surroundings in 3D as they navigate through it in real-time. The second helios robots identify different features in the environment, which is mostly done in 2D.

3D Dynamic Scene Graphs models spatial perception first to generate a 3D map of an environment in real-time and label objects, people, and structures within the 3D map.

The key component of the team's new model is Kimera, an open-source library that the team previously developed to simultaneously construct a 3D geometric model of an environment, while encoding the likelihood that an object is, say, a chair versus a desk.

Kimera produces semantic 3D mesh by way of existing neural networks trained on different kinds of real-world images to predict the label of pixels. It uses ray casting, a popular process in computer graphics, in order to project labels in 3D. The result is a map of a robot's environment that resembles dense 3D mesh.

But being reliant on mesh means a lot of time consumed and lots of money invested. To mitigate this problem, the researchers created algorithms to construct a 3D dynamic scene graph from the dense 3D semantic mesh.

By using 3D Dynamic Scene Graphs, the 3D semantic mesh has been broken down into semantic layers. The robot "sees" through these layers to identify objects in its surroundings. The layers progress in a hierarchy from objects to people to open spaces and structures to whole buildings. This stops the robot from having to make sense of billions of points and faces in the original mesh.

Of course, it wouldn't be complete without developing algorithms to track how humans move in a specific environment in real-time. Using a photo-realistic simulator, the team was able to simulate a robot moving about office environments with people navigating as well.

Watch the supplemental video of the project below.