Search Papers | Poster Sessions | All Posters

Poster A89 in Poster Session A - Tuesday, August 6, 2024, 4:15 – 6:15 pm, Johnson Ice Rink

Framework for a Generative Multi-modal model of Embodied Thought

Gregory Zelinsky1 (), Ritik Raina1, Abraham Leite1, Seoyoung Ahn2; 1Stony Brook University, NY, 2University of California, Berkeley, CA

Despite recent advances, a persistent weakness of current AI models is that they are each still far from achieving the flexibility of human thought. Here we suggest a psychologically-inspired framework for approximating thought that is embodied, multi-modal, and at its core-- generative. Core processes of object generation, world generation, and query generation are each served by sub-processes aimed at improving core process efficiency and exchanging information with other sub-processes. The model's goal-driven interaction with the world is fueled by a sequence of generations, culminating in the generation of a query to test a hypothesis that it has made about the world based on the objects that it has generated in it. We propose that this iterative cycle of generative questioning will result in this model achieving a milestone of human thought, learning that there is a self that is distinguishable from an other, and that this other is an entity in the world that can be understood by asking it questions.

Keywords: psychologically-inspired AI generative AI vision-language models 

View Paper PDF