Search Papers | Poster Sessions | All Posters

Poster A117 in Poster Session A - Tuesday, August 6, 2024, 4:15 – 6:15 pm, Johnson Ice Rink

Ecological Data and Objectives for Human Alignment

Akash Nagaraj1 (), Alekh Karkada Ashok1, Drew Linsley1, Francis Lewis1, Peisen Zhou1, Thomas Serre1; 1Carney Institute for Brain Science, Brown University

As deep neural networks (DNNs) improve on object recognition benchmarks, their representations diverge from those used by human vision. We hypothesized this misalignment arises from the contrasting data and objectives used to train DNNs versus those that shape human visual development. To test this, we developed a framework for training DNNs on rich spatiotemporal image sequences of 3D objects to improve the alignment of DNNs with human vision by training with data and objective functions that more closely resemble those relied on by brains. We evaluated three training objectives: masked autoencoding (MAE), masked vision modeling (MVM), and "causal vision modeling" (CVM), in which models predict future frames. Remarkably, CVM yielded DNN representations well-aligned with human 3D object recognition psychophysics. CVM-trained DNNs exhibited the same accuracy patterns and reaction time effects as humans for rotated objects. Representational analysis revealed that CVM causes DNNs to learn equivariance to out-of-plane transformations, explaining their human-like behavior. This provides a path towards reverse-engineering biological vision and developing artificial systems that better mimic the brain. Future work could further enrich the data and objectives to capture additional developmental principles shaping human vision.

Keywords: human vision spatiotemporal representation learning representation alignment 

View Paper PDF