Search Papers | Poster Sessions | All Posters
Poster B75 in Poster Session B - Thursday, August 8, 2024, 1:30 – 3:30 pm, Johnson Ice Rink
A biologically plausible route to learn 3D perception
Wanhee Lee1, Jared Watrous1, Honglin Chen1, Klemen Kotar1, Tyler Bonnen2, Daniel Yamins1; 1Stanford University, 2UC Berkeley
Humans find structure in visual data; we perceive three dimensional objects and scenes, even when viewing a static image. Here we evaluate the possibility that a simple learning objective gives rise to this ability: predicting the upcoming visual stimulus, given the current visual input and self-motion. We instantiate this hypothesis in silico by optimizing a transformer to predict the future images, conditioned on camera movement and the current image. This requires learning in a continuous setting (i.e., visual sequences, not standalone images), unlike standard computer vision datasets (e.g., ImageNet). To this end, we train a computational model on video datasets collected in a naturalistic 3D environment. As a proof of principle, we demonstrate how this biologically plausible optimization approach generates a visual model that can be used to infer depth, construct 3D shapes, and support cognitive process like mental rotation---all without direct supervision on these tasks. Together, our findings demonstrate how spatial perception might emerge through a biologically plausible learning objective.
Keywords: spatial perception computational cognitive science