Keynote Lecture: Michael C Frank
Bridging the data gap between children and AI models
Michael C Frank, Stanford University
Benjamin Scott Crocker Professor of Human Biology
Director, Symbolic Systems Program
Large language and language-vision models show intriguing emergent behaviors, yet they receive at least three to four – and sometimes as much as six – orders of magnitude more language data than human children. What accounts for this vast difference in sample efficiency? I will describe steps towards an ecosystem in which we can address this question. In particular, I'll discuss the use of child language and egocentric video data for model training, and the use of developmental data for model evaluation. This ecosystem has the potential to shed light on both the question of model efficiency as well as the nature of human learning.