Search Papers | Poster Sessions | All Posters

Poster A107 in Poster Session A - Tuesday, August 6, 2024, 4:15 – 6:15 pm, Johnson Ice Rink

Learning 3D object-centric representation through prediction with inductive bias of infants

John Day1, Tushar Arora2, Jirui Liu3, Li Erran Li4, Ming Bo Cai5,1 (); 1University of Tokyo, 2Boston University, 3Tsinghua University, 4Amazon, 5University of Miami

As part of human core knowledge, object is the building block of mental representation that supports high-level concepts and symbolic reasoning. Infants develop the notion of objects situated in 3D environments without supervision. Towards understanding the minimal set of assumptions needed to learn object perception, we investigate a predictive learning approach to learn three key abilities without supervision: a) segmenting objects from images, b) inferring objects' locations in 3D and c) perceiving depth. Critically, we restrict the input signals to those available to infants, namely, only streams of visual input and information of self-motion, mimicking the efference copy in the brain. In our framework, objects are latent causes of scenes constructed by the brain that facilitate efficient prediction of the future sensory input. All the three abilities are by-products of learning to predict. The model includes three networks that learn jointly to predict the next-moment visual input based on two previous scenes. This work demonstrates a new approach to learning symbolic representation grounded in sensation.

Keywords: object perception unsupervised learning infant development predictive learning 

View Paper PDF