Search Papers | Poster Sessions | All Posters
Poster B74 in Poster Session B - Thursday, August 8, 2024, 1:30 – 3:30 pm, Johnson Ice Rink
Evaluating the perceptual alignment between generative visual models and human observers on 3D shape inferences
Tyler Bonnen1 (), Riley Peterlinz1, Angjoo Kanazawa1, Alexei Efros; 1University of California, Berkeley
Humans can infer the 3D shape of objects from a single image. Computational methods in the neurosciences fail to adequately model this ability. Recently, `generative' machine learning methods have emerged as a promising approach to modeling the geometric properties of objects. Here we develop a framework to evaluate the perceptual alignment between these generative models and humans on 3D visual tasks. Given a set of experimental images (e.g., four images within an `oddity' trial), we use an image-conditioned generative model to infer object properties, including 3D shape. We then estimate relative viewpoints (i.e., camera positions relative to objects) across images. With these inferred object and viewpoint latents, we determine the similarity between objects within a trial, using an image generation procedure analogous to mental rotation. We evaluate how well a single instance of this generative model class, Large Reconstruction Model (LRM), predicts human behavior. We find that LRM does not achieve human-level performance on 3D visual inferences. Nonetheless, our approach provides an extensible framework to evaluate the perceptual alignment between humans and generative visual models.
Keywords: 3D perception generative models human-model comparison