Search Papers | Poster Sessions | All Posters
Poster B43 in Poster Session B - Thursday, August 8, 2024, 1:30 – 3:30 pm, Johnson Ice Rink
Modeling human learning in a combinatorial bandit task
Guangyu Deng1 (), Haoyang Lu1, Yi-Long Lu1, Hao Yan2, Hang Zhang1; 1Peking University, Beijing, China, 2Peking University Sixth Hospital, Beijing, China
We propose a combinatorial variant of the multi-armed bandit task that allows an agent to select combinations of multiple arms, instead of only one of the arms. The resulting action space is thus multi-dimensional and larger than usual, highlighting the exploration–exploitation dilemma and the credit assignment problem. To model human learning in this task, we develop a learning model based on the policy-gradient (PG) algorithm of reinforcement learning, an algorithm that often excels value learning algorithms in complex action spaces, and examines the mathematical properties of its updating rule. In an experiment using this new task (N = 42), we find nearly half of the participants are better fit by the PG model and exhibit behavioral patterns that value learning may have difficulty accounting for.
Keywords: reinforcement learning computational modeling multi-armed bandit policy gradient method