Search Papers | Poster Sessions | All Posters

Poster A128 in Poster Session A - Tuesday, August 6, 2024, 4:15 – 6:15 pm, Johnson Ice Rink

Are ViTs as global as we think? - Assessing model locality for brain-model mapping

Fangrui Huang1 (), Klemen Kotar1, Wanhee Lee1, Rosa Cao1, Daniel Yamins1; 1Stanford University

Recent explorations into the neural predictivity of Vision Transformer (ViT) models have shown remarkable similarities to traditional Convolutional Neural Networks (CNNs) in predicting neural responses within the visual cortex. This juxtaposition raises intriguing questions about the underlying architectural similarities and differences between these two model types, particularly in the context of spatial locality. Our study investigates the locality of receptive fields within ViTs compared to CNNs, employing a novel methodological approach that adjusts for differences in layer resolutions and total number of layers across models. Our findings suggest that despite ViTs’ global connectivity potential through attention mechanisms, they exhibit a strong bias towards local processing akin to CNNs, particularly after training. This convergence in locality patterns may explain their similar effectiveness in neural predictivity, providing new insights into how transformative architectures process visual information and their neurophysiological parallels.

Keywords: artificial intelligence neuroscience brain-model mapping interpretability 

View Paper PDF