Search Papers | Poster Sessions | All Posters

Poster A148 in Poster Session A - Tuesday, August 6, 2024, 4:15 – 6:15 pm, Johnson Ice Rink

Revealing the time-course of mid-level feature representations in scenes using rendered stimuli and ground-truth annotations

Agnessa Karapetian1 (), Alexander Lenders1, Vanshika Bawa2, Martin Pflaum3, Raphael Leuner1, Gemma Roig4, Kshitij Dwivedi4, Radoslaw M. Cichy1; 1Freie Universität Berlin, 2Albert-Ludwigs-Universität Freiburg, 3RWTH Aachen University, 4Goethe University Frankfurt

Scene perception is a key function of the human visual brain that follows a hierarchical processing stream from low- to mid- to high-level features. While the processing of low- and high-level features is well-researched, mid-level features and their temporal dynamics are still under-investigated, partly due to a lack of appropriate stimuli to probe them. To address this gap, we used a rendering software to create a rich stimulus set of images and short videos of scenes in which persons perform different actions. We also obtained the corresponding ground-truth annotations for five postulated mid-level features (reflectance, lighting, world normals, scene depth and skeleton position), as well as one low-level feature (edges) and one high-level feature (action). We collected electroencephalography (EEG) data during the presentation of these stimuli and applied encoding models to predict the EEG data from the ground-truth feature annotations. We observed that the encoding accuracy of our mid-level feature annotations peaked between ~100 ms and ~250 ms after stimulus onset, framed by the low- and high-level feature representations. This suggests that the postulated mid-level features play an intermediary role in the transformation of low-level inputs into high-level semantic information, providing insight into their place in the scene processing hierarchy.

Keywords: scene perception mid-level features EEG encoding 

View Paper PDF