Search Papers | Poster Sessions | All Posters

Poster C13 in Poster Session C - Friday, August 9, 2024, 11:15 am – 1:15 pm, Johnson Ice Rink

Unlike Brains, Pre-trained CNNs do not Encode Current or Predicted Object Contact

RT Pramod1 (), Josh Tenenbaum1, Nancy Kanwisher1; 1MIT

Interacting with the physical world requires predicting what will happen next, from catching a ball to stacking dishes to changing lanes in traffic. This ability in turn often hinges on representing contact relationships among objects, as the fate of two objects is intertwined when they are in contact. We found recently that the brain's hypothesized "physics network" both represents whether two objects are in contact and predicts future contact in simple scenarios. What computations underlie this ability? Might fast pattern recognition mechanisms like those found in convolutional neural networks (CNNs) suffice? To find out, we presented our same stimuli to CNNs pre-trained on object recognition (VGG-16) and action recognition (3D-ResNeXT-101). The scenario-invariant current and predicted object contact information we found in the brain could not be linearly extracted from these networks. Future work will test whether training on our scenarios and tasks may enable these networks to represent this information. Alternatively, the brain's ability to extract current and future contact information may depend on different computational mechanisms better captured by a structured generative model that runs approximate probabilistic simulations based on knowledge of physics and of the physical properties of the current scene, akin to those in video game engines.

Keywords: Intuitive Physics Object Contact Convolutional Neural Networks 

View Paper PDF