Search Papers | Poster Sessions | All Posters

Poster A78 in Poster Session A - Tuesday, August 6, 2024, 4:15 – 6:15 pm, Johnson Ice Rink

Emergence of in-context structure detection through self-supervised learning

Pierre orhan1 (), Fosca AlRoumi2, Yves Boubenec1, Jean-Rémi King1; 1Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, 2Cognitive Neuroimaging Unit, Université Paris Saclay, INSERM, CEA, CNRS, Neurospin center

Humans' ability to spontaneously detect symbolic structures is often considered to be essential to the acquisition of language and music. Prominent theories postulate that core, innate and internal mechanisms, like ``merge'' (Chomsky) or ``neural recursion'' (Dehaene), are foundational to this feat. Here we tested the alternative hypothesis that the ability to detect symbolic structures emerges from generic statistical learning operating onto external naturalistic inputs, that are structured in themselves. We focused on auditory stimuli, for which a wealth of experimental protocol questions structure detection. First, we exposed a self-supervised auditory model to a dataset merging music, speech and environmental sounds. Second, we exposed them to classical neuroscience experimental protocols and evaluated the models' ability to perform zero-shot detection of regularities, including algebraic structures. Like humans, training brought models to detect (1) repeated sequences, (2) probabilistic chunks and (3) algebraic structures, (4) with diminished performance for structures of increasing complexities. Furthermore, we show that this ability was a direct consequence of self-supervised learning: the more the models are exposed to natural sounds, the more they spontaneously detect increasingly complex structures. Overall, we demonstrate that the emergence of the structure detection need not require a dedicated internal mechanism: rather, self-supervised learning operating on external sensory inputs being sufficient for the emergence of internal computations capable of detecting regular patterns such as algebraic structures.

Keywords: Self-supervised learning statistical learning language music 

View Paper PDF